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Chapter one 


Introduction to functions 


A function is a rule which operates on one number to give another number. However, not every 
rule, describes a valid function. This unit explains how to see whether a given rule describes a 
valid function, and introduces some of the mathematical terms associated with functions. 


In order to master the techniques explained here it is vital that you undertake plenty of practice 
exercises so that they become second nature. 

After reading this text, and/or viewing the video tutorial on this topic, you should be able to: 

% 

• recognise when a rule describes a valid function. 

• be able to plot the graph of a part of a function. 

• find a suitable domain for a function, and find the corresponding range. 
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1. What is a function? 

Here is a definition of a function. 

A function is a rule which maps a number to another unique number. 

In other words, if we start off with an input, and we apply the function, we get an output. 

For example, we might have a function that added 3 to any number. So if we apply this function 
to the number 2, we get the number 5. If we apply this function to the number 8, we get the 
number 11. If we apply this function to the number x, we get the number x + 3. 

We can show this mathematically by writing 

f(x) = x + 3. 

The number x that we use for the input of the function is called the argument of the function. 
So if we choose an argument of 2, we get 

/( 2 ) = 2 + 3 = 5. 


If we choose an argument of 8, we get 


/(8) = 8 + 3 = 11. 

If we choose an argument of —6, we get 

/(- 6 ) = -6 + 3 =-3. 
If we choose an argument of z, we get 


/ 0) = 2 + 3. 

If we choose an argument of x 2 , we get 

/(x 2 ) = x 2 + 3. 

At first sight, it seems that we can pick any number we choose for the argument. However, that 
is not the case, as we shall see later. But because we do have some choice in the number we 
can pick, we call the argument the independent variable. The output of the function, e.g. f(x), 
/(5), etc. depends upon the argument, and so this is called the dependent variable. 


Key Point 

A function is a rule that maps a number to another unique number. 

The input to the function is called the independent variable, and is also called the argument of 
the function. The output of the function is called the dependent variable. 
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2. Plotting the graph of a function 

If we have a function given by a formula, we can try to plot its graph. Suppose, for example, 
that we have a function / defined by 

/(re) = 3x 2 — 4. 

The argument of the function (the independent variable) is x, and the output (the dependent 
variable) is 3x 2 — 4. So we can calculate the output of the function for different arguments: 


m 

= 3 x 0 2 - 4 

-4 

/< i) 

= 3 x l 2 -4 

-1 

m 

= 3 x 2 2 - 4 

8 

/(-1) 

= 3 x (-1) 2 -4 = 

-1 

/(-2) 

= 3 x (—2) 2 — 4 = 

8. 


We can put this information into a table to help us plot the graph of the function. 



X 

f(x) 

-2 

8 

-1 

-1 

0 

-4 

1 

-1 

2 

8 


We can use the graph of the function to find the output corresponding to a given argument. For 
instance, if we have an argument of 2, we start on the horizontal axis at the point where x = 2, 
and we follow the line up until we reach the graph. Then we follow the line across so that we 
can read off the value of f{x) on the vertical axis. In this case, the value of f{x) is 8. Of course 
we already know this, because x = 2 is one of the values in our table. 
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But we can also use the graph for values of x which are not in our table. If we have an argument 
of 1.5, we follow the line up to the graph, and then across to the vertical axis. The result is a 
number between 2 and 3. 



If we want to calculate this number exactly, we can substitute 1.5 into the formula: 


/ (1-5) = 2.75 


3. When is a function valid? 


Our definition of a function says that it is a rule mapping a number to another unique number. 
So we cannot have a function which gives two different outputs for the same argument. One 
easy way to check this is from the graph of the function, by using a ruler. If the ruler is aligned 
vertically, then it only ever crosses the graph once; no more and no less. This means that the 
graph represents a valid function. 

What happens if we try to define a function with more than one output for the same argument? 
Let's try an example. Suppose we try to define a function by saying that 


f(x) = sfx. 


In the same way as before, we can produce a table of results to help us plot the graph of the 
function: 

m = o 

/(i) = ±i 

/(2) = ±1.4 to 1 d.p. 

/(3) = ±1.7 to 1 d.p. 

/( 4 ) = ± 2 . 

(If we try to use any negative arguments, we end up in trouble because we are trying to find 
the square root of a negative number.) Plotting the results from the table, we get the following 
graph. 
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Using the ruler, it is quite clear that there are two values for all of the positive arguments. So 
as it stands, this is not a valid function. 

One way around this problem is to define yfx to take only the positive values, 
sometimes called the positive square root of x. However, there is still the issue 
choose a negative argument. So we should also choose to restrict the choice 
positive values, or zero. 



When considering these kinds of restrictions, it is important to use the right mathematical 
language. We say that the set of possible inputs is called the domain of the function, and the set 
of corresponding outputs is called the range. In the example above, we have defined the function 
as follows: 

f(x) = yfx x > 0, f(x) > 0, 

so that the domain of the function is the set of numbers x > 0, and the range is the corresponding 
set of numbers f(x) > 0. 


or zero: this is 
that we cannot 
of argument to 


Key Point 

The domain of a function is the set of possible inputs. The range of a function is the set of 
corresponding outputs. 
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4. Some further examples 


Example 

Consider the function 

f(x) = 2x 2 — 3x + 5. 

To make sure that the function is valid, we need to check whether we get exactly one output 
for each input, and whether there needs to be any restriction on the domain. As before, we can 
calculate the output of this function at some specific values to help us with plotting our graph: 

/(0) = 2x0 2 -3x0 + 5 
= 5, 

/(1) = 2xl 2 -3x1 + 5 
= 2-3+5 
= 4 , 

/(2) = 2x2 2 -3x2 + 5 
= 8-6+5 
= 7 , 

/(3) = 2x3 2 -3x3 + 5 
= 18-9 + 5 
— 14 , 

/(-l) 2 x (—l) 2 — 3 x (—1) + 5 

= 2+3+5 
= 10 . 


Now we can put this into a table, and plot the graph. 



X 

f(x) 

-1 

10 

0 

5 

1 

4 

2 

7 

3 

14 
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A vertical ruler always crosses the graph once, and so the domain needs no restrictions, and 
the function is valid. We can also see from the graph that the minimum output occurs when 
x = 0.75, and that is when f(x) = 3.875. So the range of the function is f(x) > 3.875. 


Example 

What about the function 

f(x) = - ? 

x 

As usual, the first step is to check some values. 


/(1) = 1, /(2) = i 


/(-l) = 
/(- 3 ) = 


(- 1 ) 

1 


- 3 ) 


= - 1 , 


/ ( 3 ) = §, 
/(- 2 ) = - 
/(- 4 ) = - 


/(4) = J, 


- 2 ) 

1 

- 4 ) 


i 

'2’ 


When we try to calculate /(0) we have a problem, because we cannot divide by zero. So we 
have to restrict the domain to exclude x = 0. 



Because of this problem when x = 0, we have to restrict the domain to make the function valid. 
You can also see from the graph that there is no value of x where f(x) = 0, so zero is also 
excluded from the range. The function is therefore defined by 


f(x) = l/x x ± 0, / (x) ± 0. 


You might want to know what exactly is going on at this point when x — 0. One way to find 
out is to look at what is happening very close to zero. So let's try some positive values for the 
argument getting closer and closer to zero, in order to see what happens: 


/(l) = 1 , 




1 

TJTo 


io, 


/ 



l 

1/1,000 


1 , 000 , 


/ 


1,000,000 


1 

1/1,000,000 


1 , 000 , 000 . 


So you can see that as we approach zero from the right, the output approaches infinity. 
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Now let's try some negative values for the argument, getting closer and closer to zero from the 
left-hand side, in order to see what happens: 


/ 


/(-l) = 


1,000 


(- 1 ) 


= - 1 , 


(- 1 / 1 , 000 ) 




= - 1 , 000 , 


(- 1 / 2 ) 

/ 


= - 2 , 


/ - 


= -io, 


1,000,000 


10 / (- 1 / 10 ) 

= - 1 , 000 , 000 . 


(- 1 / 1 , 000 , 000 ) 


You can see that as we approach zero from the left, the output approaches negative infinity. So 
in this case, approaching zero from the left is very different from approaching it from the right. 


Example 

For our final example, take the function 

fix) 


1 

(, x-2) 2 ' 


As usual, we must calculate some values: 


/(—2) = 

1 

i 

(-2-2) 2 

16 

/(-l) = 

1 

1 

(-1 - 2) 2 

9 

/(0) = 

1 

1 

(0-2) 2 

1 

4 

/(l) = 

1 

= 1 

(1-2)2 

/(2) = 

1 

= dividing by zero 

(2-2)2 

/(3) = 

1 

= 1 

(3-2)2 

/(4) = 

1 

i 

(4 - 2)2 

1 

4 

/(6) = 

1 

1 

(5-2)2 

9 

/(6) = 

1 

1 

(6-2)2 

16’ 


Now we can construct a table, and plot the graph of the function. 


X 

fix ) 

-2 

i 


16 

-1 

1 

9 

0 

1 

4 

1 

1 

2 

dividing by zero 

3 

1 

4 

i 

4 

5 

1 

9 

6 

1 


16 
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The vertical line at x = 2 is called an asymptote; it is a line which is approached by the graph but 
never reached. Although it is clear that we will have to omit x = 2 from the domain, this time the 
situation is slightly different. You can see that at either side of x = 2, f(x) is approaching infinity. 
You should also notice that we cannot have any negative values for f(x ), or even f(x) = 0. So 
we have to define our function as 

f(x) = 7 ^ j 2 y ^ ^ 2 , f(x) > 0 . 


Exercises 

1. Consider the function 

f(x) = 2x 2 + 5x — 3. 

(a) Write down the argument of this function. 

(b) Write down the dependent variable in terms of the argument. 

(c) Use a table of values to help you plot the graph of the function. 

(d) From your graph, estimate /(1.5). 

(e) Use your function to calculate /(1.5) exactly. 

(f) Write down the domain and range of the function. 

(g) Re-write the function with argument y. 

2. Consider the function 

/w = 

(a) Plot the graph of the function. 

(b) Write down the domain and range of the function. 

(c) Re-write the function with argument z. 

(d) Use your graph to estimate /(1). 

(e) Use the function to calculate /(1) exactly. 

(f) Write down another function where x = 4 has to be omitted from the domain. 

3. Consider the function 

f(x) = 3^x. 

(a) What assumptions must be made about the function to ensure validity? 

(b) Plot the graph of the function. 
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(c) Write down the domain and range of the function. 

(d) Calculate /(3). 

(e) What happens if you try to calculate /(—2)? 

4. Consider the function 

f( x ) = -■ 

x 

(a) Plot the graph of the function. 

(b) Write down the domain and range of the function. 

(c) What happens to the output of the function as the argument approaches zero? 

(d) Is approaching zero from the left different to approaching zero from the right? If yes, why? 

(e) Calculate /(2), /(—10) and f(z 3 ). 

5. In the following list, you should write down the domain and range for each function, and then 
pair up functions that share the same domain and range. 


f( x ) 

= 2 sin x — 1 

f( x ) 

= x 2 — 6x + 9 

f( x ) 

= 2e~ x 

f( x ) 

= 4 — x 2 

f( x ) 

= 2x — x 2 + 3 

f( x ) 

= ?>e bx 

f( x ) 

= 2 cos 2x — 1 

f( x ) 

= 4x 2 — 16x + 16 


Answers 

1 . 

(a) The argument is x. 

(b) The dependent variable is 2x 2 + 5x — 3 in terms of x. 

(c) An example of a table of values might be 


X 

-4 

-3 

-2 

-1 

0 

1 

2 

f( x ) 

9 

0 

-5 

-6 

-3 

4 

15 
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(cl) Draw a line up from x = 1.5, and read off the value on the y- axis. 

(e) /(1.5) =9. 

(f) Reading off from the graph, f(x) > —6.125, and also —oo < x < oo. 

(g) The function is f{y) = 2 y 2 + 5y — 3 using an argument y. 
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2 . 

(a) 



(b) The domain is x ^ 3, the range is f(x) > 0. 

1 


(c) The function is f(z ) = 


using an argument z. 


(z — 3) 2 

(d) Draw a line up from x — 1, and read off the value on the y-axis. 

(e) /(l) = 1 


1 

4 ' 


(1 — 3) 2 

(f) One such function is f(x ) = 


1 


(x — 4) 


but there are many others. 


(a) We must assume that we take the positive square root, and that we take x > 0. 

(b) 



(c) The domain is x > 0, the range is /(x) > 0. 

(d) /(3) = 3^ = 5.2 (to 1 d.p.). 

(e) If you try to calculate /(—2), you will be attempting to find the square root of a negative 
number. This has no real solutions. 
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4. 

(a) 



(b) The domain is x ^ 0, the range is f{x) ^ 0. 

(c) The output approaches ±oo. 

(d) Approaching from the left is different to approaching from the right. Approaching from the 
left, the output decreases rapidly to negative infinity. Approaching from the right, the output 
increases rapidly to positive infinity. 

(e) /(2) = /(-10) = -I f{z 3 ) = l/z\ 



domain 

range 

f(x) = 2 sin a; — 1 

—OO < X < oo 

-3 < /( x) < 1 

f(x) = x 2 — 6x + 9 

—OO < X < oo 

f(x) > 0 

f(x) = 2e~ x 

—OO < X < oo 

f(x) > 0 

f(x) = 4 — x 2 

—OO < X < oo 

f(x) < 4 

/( x) = 2x — x 2 + 3 

—OO < X < oo 

f(x) < 4 

f(x ) = 3e 5x 

—OO < X < oo 

f(x) > 0 

f (x) = 2 cos2x — 1 

—OO < X < oo 

-3 < f(x) < 1 

f(x) = 4x 2 — I6x + 16 

—OO < X < oo 

f(x) > 0 


So the pairs are 

f(x) — 2sm(x) — l and f(x) 
f(x)=x 2 — 6x + 9 and f(x) 
f(x) = 2e~ x and f(x) 

f(x) = 4 — x 2 and f(x) 


= 2 cos 2x — 1, 

= 4x 2 — 16a; + 16 , 
= 3e 5x , 

= 2x — x 2 + 3 . 
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Examples of Functions 

In this document is provided examples of a variety of functions. The purpose is to 
convince the beginning student that functions are something quite different than 
polynomial equations. 

These examples illustrate that the domain of functions need not be the real numbers or 
any other particular set. These examples illustrate that the rule for a function need not be 
an equation and may in fact be presented in a great variety of ways. 

Some of the functions receive more complete treatment/discussion than others. If a 
particular function or function type is normally a part of a College Algebra course, then a 
more complete discussion of that function and its properties is likely. 

Throughout this document the following definition of function, functional notation and 
convention regarding domains and ranges will be used. 

Definition: A function consists of three things; 

i) A set called the domain 

ii) A set called the range 

iii) A rule which associates each element of the domain with a unique element of 
the range. 

Functional Notation: 



The unique range element 
which Is associated will the 
domain element x 



A domain element 


Convention: When the domain or range of a function is not specified, it is assumed that 
the range is R, and the domain is the largest subset of R for which the rule makes sense. 
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I) Some Very Simple Functions 


The domains and rule of the three functions shown in 
these diagrams are quite simple, but they are indeed 
functions. 

Moreover view of functions consisting of two sets 
with arrows from elements of the domain to elements 
of the range will serve well to help understand 
functions in general 

The example below shows quite clearly that the 
domain and range of a function need not be sets of 
numbers. In most College Algebra courses the 
functions discussed do have domains and ranges 
which are sets of numbers, but that is not a 
requirement of the definition of function. 




II) Linear Functions 

Definition: A linear function is a function whose domain is R, whose range is R, and 
whose rule can be expressed as a linear equation. 

Recall a linear equation in two variables is an equation which can be written as 
y = mx + b where m and b are real numbers. 

This leads to the following alternate preferred definition for a linear function. 

Definition: A linear function is a function whose domain is R, whose range is R, and 
whose rule can be written as f(x) = mx + b where x is a domain element, m e R and 
beR. 

Comment: Linear functions are a special case of the more general class of functions 
called polynomial functions. Thus the set of linear functions is a subset of the set of 
polynomial functions. 
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Example: Let f be the function whose rule is f(x) = 3x + 5. 

Convention dictates that the range of this function is R and because the rule makes sense 
for every real number substituted into the rule, the domain is also R. The rule is written 



Comment: The word linear is now being used as an important adjective in four distinct 
contexts. We speak of linear polynomials, linear equations in one variable, linear 
equations in two variables, and linear functions. 

• A linear polynomial is an expression which can be written as ax + b where a and b 
are real numbers. A linear polynomial is a mathematical creature just like natural 
numbers, rational numbers, irrational numbers, matrices, and vectors are 
mathematical creatures. There is no equation involved. The concept of solving a 
linear polynomial is meaningless. The arithmetic-like concepts of addition, 
multiplication, and subtraction in the context of all polynomials as well as linear 
polynomials does have meaning. 

• A linear equation in one variable is an equation which can be written as ax + b = 0 
where a and b are real numbers. The graph of a linear equation in one variable is 
a sngle point on the x-axis unless a = 0 and b ^ 0 in which case there are no 
solutions and hence no graph. 

• A linear equation in two variables is an equation which can be written in the form 
y = mx + b where m and b are real numbers. This is called the slope-intercept 
form of the equation of a line. The graph of a linear equation in two variables is a 
line. The coefficient m is the slope of the graph and b is the y-intercept of that 
graph. 

• A linear function is a function whose domain is R, whose range is R, and whose 
rule may be written as f(x) = mx + b where a and b are real numbers. The graph 
of a linear function is a non-vertical line with slope m and y-intercept b. A linear 
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function is a mathematical creature just like natural numbers, rational numbers, 
irrational numbers, matrices, vectors, and linear polynomials are mathematical 
creatures. The concept of solving a linear function is meaningless. The 
arithmetic-like concepts of addition, multiplication, and subtraction in the context 
of all functions as well as linear functions does have meaning. A variety of 
additional facts peculiar to functions will be studied in this course. 

Ill) Quadratic Functions 

Definition: A quadratic function is a function whose domain is R, whose range is R, and 
whose rule can be expressed as a quadratic equation. 

Recall that a quadratic equation in two variables is an equation that can be written 
y = ax 2 + bx + c where a, b, and c are real numbers and a ^ 0. 

This leads to the following alternate preferred definition for a quadratic function. 

Definition: A quadratic function is a function whose domain is R, whose range is R, and 
whose rule can be written as f(x) = ax" + bx + c where a, b, and c are real numbers and 
a^O. 

Comment: Quadratic functions are a special case of the more general class of functions 
called polynomial functions. Thus the set of quadratic functions is a subset of the set of 
polynomial functions. 

2 

Example: Let f be the function whose rule is f(x) = x + x - 6. 

The convention dictates that the range of this function is R and because the rule makes 
sense for every real number substituted into the rule, the domain is also R. The rule is 
written as a quadratic equation, so it follows that f is a quadratic function. 
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Comment: The word quadratic is now being used as an important adjective in four 
distinct contexts. We speak of quadratic polynomials, quadratic equations in one 
variable, quadratic equations in two variables, and quadratic functions. 

• A quadratic polynomial is an expression which can be written as ax 2 + bx + c 
where a and b are real numbers. A quadratic polynomial is a mathematical 
creature just like natural numbers, rational numbers, irrational numbers, matrices, 
vectors, linear polynomials, and linear functions are mathematical creatures. 

There is no equation involved. The concept of solving a quadratic polynomial is 
meaningless. The arithmetic-like concepts of addition, multiplication, and 
subtraction in the context of all polynomials as well as quadratic polynomials do 
have meaning. 

• A quadratic equation in one variable is an equation which can be written as 
ax 2 + bx + c = 0 where a, b, and c are real numbers. The graph of a quadratic 
equation in one variable may be a sngle point, or two points on the x-axis. If the 
discriminant of the quadratic polynomial is negative the quadratic equation in one 
variable has no real solutions and hence has no graph. 

• A quadratic equation in two variables is an equation which can be written in the 
form y = ax 2 + bx + c where a, b, and c are real numbers and a ^ 0. The graph of 
a quadratic equation in two variables is a parabola which opens up if the leading 
coefficient is positive and opens down if the leading coefficient is negative. 

• A quadratic function is a function whose domain is R, whose range is R, and 
whose rule may be written as f(x) = ax + bx + c where a, b, and c are real 
numbers and a ^ 0. The graph of a quadratic function is a parabola which opens 
up if a > 0 and opens down if a < 0. A quadratic function is a mathematical 
creature just like natural numbers, rational numbers, irrational numbers, matrices, 
vectors, and linear polynomials are mathematical creatures. The concept of 
solving a linear function is meaningless. The arithmetic-like concepts of addition, 
multiplication, and subtraction in the context of all functions as well as quadratic 
functions do have meaning. A variety of additional facts peculiar to functions 
will be studied in this course. 

IV) Polynomial Functions 

Definition: A polynomial function is a function whose domain is R, whose range is R, 
and whose rule can be expressed as a polynomial equation. 

Recall that a polynomial equation in two variables is an equation that can be written 

y = a n x" + a n l x ll ~ l -t-b a { x + a 0 where n is a whole number and each of the coefficients 

a ; is a real number. 

This leads to the following alternate preferred definition for a quadratic function. 
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Definition: A polynomial function is a function whose domain is R, whose range is R, 

and whose rule can be expressed as f(x) = a n x" + a n X x n ~ l -t-1- a 1 x + a 0 where n is a 

whole number and each of the coefficients a; is a real number. 

Examples: Every linear function is a polynomial function. 

Examples: Every quadratic function is a polynomial function. 


5 2 

Example: Let f be the function whose rule is f(x) = x + 3x“ - 6x + 9. 

Convention dictates that the range of this function is R and because the rule makes sense 
for every real number substituted into the rule, the domain is also R. The rule is written 
as a polynomial equation, so it follows that f is a polynomial function. 

Polynomial functions will be a major topic of study in College Algebra. 


V) Absolute Value Function 


Definition: The absolute value function abs is the function whose rule is given by 


abs(x) 


jx ifx>0 
{ -x if x < 0 


Convention dictates that the range of this function is R 
because the rule makes sense for every real number 
substituted into the rule, the domain is also R. 


The graph of abs is shown at the right. 

The name abs is less used in discussions involving the 
absolute value function. The more familiar notation 
| x | is usually used in lieu of abs(x). 



VI) Sequences 

Definition: A sequence is a function whose domain is the set of Natural Numbers N. 

The definition of sequence is pretty simply stated, but there are many special 
consequences of the fact that the domain of a sequence is N. Sequences have been 
studied for centuries. Sequences have been studied with and without the concept of 
function. Quite a bit of special (mostly historical in origin) terminology and notation is 
used in a discussion of sequences. 

When working with sequences, range elements are frequently called terms of the 
sequence. For example: 

• The range element associated with the domain element 1 is called the first term of 
the sequence. 
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• The range element associated with the domain element 6 is called the sixth term 
of the sequence. 

• The range element associated with the domain element n is called the n th term of 
the sequence. 

Because the domain of every sequence is N, the graph of a sequence will consist of a set 
of discrete points. 

Because the domain of every sequence is N, it is possible to speak of the first domain 
element. 

Because the domain of every sequence is N, it is possible to speak of the next domain 
element. 

Because the domain of every sequence is N, it is not possible to speak of the last domain 
element. 

Because the domain of every sequence is N, it is possible to ask about and compute the 
sum of the first k terms. This is usually called the k th partial sum of the sequence. 

Definition: The n' h partial sum of a sequence is defined to be the sum of the first n terms 
of the sequence. 

VI - A) Tau 

The function whose name is the Greek letter T (pronounced tau) is a function whose 
domain is the Natural Numbers N. So T is a sequence. The rule for T is not given by a 

formula. The rule for T is: T(n) is the number of positive divisors of n. 

To compute the range value associated with a particular domain element n, it is necessary 
to determine all positive divisors of n and simply count them. It is convenient to think of 

T simply as a function which counts the number of positive divisors of domain elements. 

T(l) = 1 1(2) = 2 T(3) = 2 1(4) = 3 T(5) = 2 T(6) = 4 

T(7) = 2 T(8) = 4 T(9) = 3 1(10) = 4 T(ll) = 2 T(12) = 6 

The graph of the first 12 terms of Tau consists of the points: 

(1.1) (2,2) (3,2) (4,3) (5,2) (6,4) 

(7.2) (8,4) (9,3) (10,4) (11,2) (12,6) 

Notice that if p is a prime number, then T(p) = 2 and if k is a composite number, then 
T(k) > 2. In fact sometimes T is used to define prime numbers as: 

Definition: A natural number p is prime if and only if T(p) = 2. 

Notice how neatly this definition prohibits classifying 1 as a prime number. 
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VI - B) Sigma 

The function whose name is the Greek letter O (pronounced sigma) is a function whose 
domain is the Natural Numbers N. So O is a sequence. The rule for O is not given by a 

formula. The rule for (7 is: <j(n) is the sum of the positive divisors of n. 

To compute the range value associated with a particular domain element n, it is necessary 
to determine all positive divisors of n and simply add them. 

0(1) = 1 0(2) = 3 0(3) = 4 0(4) = 7 0(5) = 6 0(6) = 12 

0(7) = 8 0(8) = 15 0(9) = 13 0(10) = 18 0(11) = 12 0(12) = 28 

The graph of the first 12 terms of Sigma consists of the points: 

(1,1) (2,3) (3,4) (4,7) (5,6) (6,12) 

(7,8) (8,15) (9,13) (10,18) (11,12) (12,28) 

VI - C) Fibonacci Sequence 

Definition: The Fibonacci sequence F is the function whose domain is N and whose rule 
is given recursively by: F(l) = 1, F(2)= 1, and for n > 2, F(n) = F(n - 1) + F(n - 2) 

F( 1) = 1 F(2) = 1 F(3) = 2 F(4) = 3 F(5) = 5 F(6) = 8 

F(7) = 13 F(8) = 21 F(9) = 34 F(10) = 55 F(ll) = 89 F(12) = 144 

The graph of the first 12 terms of the Fibonacci sequence consists of the points: 

(1,1) (2,1) (3,2) (4,3) (5,5) (6,8) 

(7,13) (8,21) (9,34) (10,55) (11,89) (12,144) 

VI - D) Arithmetic Sequences 

Definition: An arithmetic sequence is a sequence whose consecutive terms have a 
common difference. 

Equivalent Definition: An arithmetic sequence f is a function whose rule may be 
expressed as a linear equation of the form f(n) = dn + b where d is the common difference 
and b is the difference f(l) - d. 

Comments: Compare the equivalent definition of an arithmetic sequence with the 
definition of a linear function. 

The domain of a linear function is R The domain of an arithmetic sequence is N 
The rule for a linear function is f(x) = mx + b 
The rule for an arithmetic sequence is f(x) = dx + b 

The number b in the rule for an arithmetic sequence is the range value associated with 0 
(if there were such a range element) so it corresponds exactly to the y-intercept of the 
linear function. 
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The common difference d is nothing more than the slope as you move from one range 
element to the next. The slope of the line joining two terms (x, f(x)) and (x + 1, f(x + 1)) 


of the sequence is given by 


/(* + !)- 


■fix) d 

= — = a . 

1 


(x + l)-x 

A casual approach is to view an arithmetic sequence as a linear function with domain N. 


Comment: Given any two pieces of information about an arithmetic sequence it is 
possible to determine its rule. The next three problem types illustrate the point. 


Problem Type 1: If you are given the common difference and the first term of the 
arithmetic sequence, then it is possible to write the rule for the function. This is 
comparable to the slope-intercept situation/problem when working with linear functions. 
Example: Suppose an arithmetic sequence named h has a common difference 8 and the 
first term is -5. Find the rule for the function h. 

Solution: Snce the function is an arithmetic sequence its rule is of the form 
h(n) = dn + b. In our case the common difference d is 8, the rule for h has the form 
h(n) =8n + b. Because the first term is -5, b = -5 - 8 = -13 and the rule for the desired 
arithmetic sequence is given by h(n) = 8n - 13. 

Problem Type 2: If you are given the common difference d and one term of an 
arithmetic sequence, then it is possible to write the rule for the function. This is 
comparable to the point-slope situation/problems when working with linear functions. 
Example: Suppose an arithmetic sequence named h has a common difference 3 and the 
fifth term is 12. Find the rule for the function h. 

Solution: Snce the function is an arithmetic sequence its rule is of the form 
h(n) = dn + b. In our case the common difference d is 3, the rule for h has the form 
h(n) =3n + b. Because the fifth term is 12, h(5) =12, but according to the partially 
determined rule h(5) = 3(5) + b = 15 + b. These two representations for h(5) yield the 
equation 12 = 15 + b. Clearly then b = -3 and the rule for the desired arithmetic sequence 
is given by h(n) = 3n - 3. 

Problem Type 3: If you are given two terms of an arithmetic sequence, then it is 
possible to write the rule for the function. This is comparable to the two point 
situation/problems when working with linear functions. 

Example: Suppose the fourth term an arithmetic sequence named k is 10 and the seventh 
term is 28. Find the rule for the function h. 

Solution: Snce the function is an arithmetic sequence its rule is of the form 
h(n) = dn + b. The difference between the seventh and fourth terms is 3d and is also 
equal to 28 - 10 = 18. That means 3d = 18 and so the common difference d is 6. 

The rule for h has the form h(n) =6n + b. Because the fourth term is 10, h(4) = 10, but 
according to the partially determined rule h(4) = 6(4) + b = 24 + b. These two 
representations for h(4) yield the equation 10 = 24 + b. Clearly then b = -14 and the rule 
for the desired arithmetic sequence is given by h(n) = 6n - 14. 

Comment: The formula for the n th partial sum of an arithmetic sequence named a is: 

S„ = |0i + a n ) 
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VI - D) Mod Functions 

The function named mod6 

The name of this function is mod 6 . The domain of mod6 is the set of all natural numbers. 
The range of mod 6 is the set {0,1,2,3,4,5}. Because the domain of mod6 is N, mod6 is a 

sequence. The rule for mock is given by: 

mode(n) is the remainder when n is divided by 6. 

To see that mod 6 is a function, we refer back to the “arrow” concept of function. The fact 
that every natural number may be divided by 6 insures that an arrow emanates from each 
element of the domain. 

The division algorithm states that for any natural number n there is a unique quotient q 
and a unique remainder r such that r e {0,1,2,3,4,5}. 

The fact that r e {0,1,2,3,4,5} insures that the arrows end in the range and the fact that 
the remainder is unique insures that only one arrow emanates for each domain element. 

Therefore mod6 is a function. 

Here are examples of range elements associated with some domain elements 
mod6(3) = 3 mod6(8) = 2 mode(17) = 5 mod6(424) = 4 

The function named modn 

The name of this function is modn. The domain of modu is the set of all natural 
numbers. The range of mod n is the setjo, 1,2,3,4,5,6,7,8,9,10} ■ Because the domain of 

mod ii is N, modi i is a sequence. The rule for mod n is given by: 
modn(n) is the remainder when n is divided by 11. 

To see that modn is a function, we refer back to the “arrow” concept of function. The 
fact that every natural number may be divided by 11 insures that an arrow emanates from 
each element of the domain. 

The division algorithm states that for any natural number n there is a unique quotient q 
and a unique remainder r such that r e{0,1,2,3,4,5,6,7,8,9,10] . 

The fact that re {0,1,2,3,4,5,6,7,8,9,10} insures that the arrows end in the range and the 

fact that the remainder is unique insures that only one arrow emanates from each domain 
element. 

Therefore modn is a function. 

Here are examples of range elements associated with some domain elements 
modn(3) = 3 modn(8) = 8 modn(17) = 6 modn(424) = 6 

It should be clear that for each natural number k there is a corresponding mod k function 
defined in the same manner as mock and modn above. 
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VII) Exponential Function exp 


e is one of those special numbers in mathematics, like pi, that 









keeps showing up in all kinds of important places. Like pi, e is 
an irrational number. The value of e may be approximated by 
e ~ 2.7182818284 

































This irrational number is the foundation of a very important 









pair of functions in mathematics. These two functions are exp 









and In (that is the letter ell). The function exp is quite easy to 








define. 


















The function exp has domain R and range R. The rule for exp 









is given by the exponential equation exp(x) = e x . The graph of 

5 








exp is shown here. 

The function exp is called the exponential function base e. 


The function In is called the natural 
logarithm function. 

The domain of In is all positive real 
numbers. 

The rule for In is given in terms of its inter¬ 
relation with the function exp. The function 
In is the one and only function which has the 
property that 

ln(exp(x)) = x and exp(ln(x)) = x 



The graph of In is shown at the right. 
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VIII) Circular Functions 

We will now consider the functions whose names are sn, and cs, whose domain is the 
closed interval [0, 2n] and whose range is the closed interval [-1, 1], These and four 
other functions are called circular functions or more traditionally Trigonometric 
functions. 


Observe the length of this domain is 2n. Here is a picture of the domain of the function 
named sn. A few points which we will use later in the discussion are marked. 

E-1-1-1-1-1-1-1-3 
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The rule for this function will not be described with an 
equation but will instead be described in terms of the 
coordinates of points on the unit circle. 

Recall that the unit circle is the circle with radius 1 whose 
center is at the origin of the Cartesian coordinate system 
and is described by the equation x 2 + y 2 = 1. 

Also recall that the radius of a circle is given by the 
formula C = 27ir. In the case of the unit circle, the 
circumference is 271. This circumference is exactly the 
same length as the domain of the function named sn. 



The significance of this comparison is that for any real number in the domain of the 
function named sn, there is a corresponding point on the unit circle. The converse is also 


true, for every point on the circumference of the 
unit circle there is a real number in the domain of 
the function. 


On the unit circle the point (1, 0) is always 
considered the starting point and distance is always 
measured on the circumference in the 
counterclockwise direction. 

• The point on the unit circle which 

corresponds to 7t/2 in the domain of sn is the 
point with coordinates (0, 1). 
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• The point on the unit circle which corresponds to n in the domain of sn is the 
point with coordinates (-1, 0). 

• The point on the unit circle which corresponds to 3n/2 in the domain of sn is the 
point with coordinates (-1, -1). 

• The point on the unit circle which corresponds to 2n in the domain of sn is the 
point with coordinates (1,0). 

Whether in the domain of sn or on the circumference of the unit circle, these four points 

1 1 3 

are at the starting point, —the total distance, —the total distance, and — the total distance. 

We are now ready to provide the rule for the functions named sn and cs. 

RULE: For any x e [0, 2n ], sn(x) is the second coordinate of the corresponding point on 

the circumference of the unit circle. 

RULE: For any x e [0, 2n ], cs(x) is the first coordinate of the corresponding point on the 

circumference of the unit circle. 


The above diagram shows that: 

= 1, sn(/r)=0, 


sn(0) = 0, sn 
cs(0) = l, cs 


( n ^ 


V ^ J 
f 71^ 


v z y 


= 0, cs(tt) = — 1, 


sn 


3 n 


= -l 


cs 


3 n 


= 0, 


, sn(2/r)=0 
cs(27t) = 1 


We will now look at the range values associated with a few 
other domain elements. In particular we will examine those 
numbers (domain elements) midway between each pair of the 
previous four numbers in the domain of sn and cs. 


Recall the rules for the functions sn and cs and extract the 
following range values directly from the picture at the right. 
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The figure at the right shows additional points in the domain of 
sn and cs with their coordinates on the unit circle. From this 
diagram and the rules for the two functions we can conclude: 





( 2k n 
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= - , 
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In a manner similar to the examples presented in these few 
examples, the unique range values associated with a domain 
element may be determined. 



Each point in the domain of sn and cs corresponds with a point on the unit circle which in 
turn corresponds with a set of first and second coordinates which determine the unique 
range value associated with the domain element. 


The two functions sn and cs and four other circular functions are the functions studied in 
Trigonometry. How these functions relate to angles, triangles, radian measure, etc. will 
not be discussed here. The purpose here is simply to give an illustration of some 
functions whose rules are unusual and whose domains and ranges are not all of R. 
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Functions 

• Definition : Let A and B be two sets. A function from A to B, 

denoted f : A —> B , is an assignment of exactly one element of 
B to each element of A. We write f(a) = b to denote the 
assignment of b to an element a of A by the function f. 


A f: A ->B 
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Functions 


• Definition : Let A and B be two sets. A function from A to B, 

denoted f: A —> B , is an assignment of exactly one element of 
B to each element of A. We write f(a) = b to denote the 
assignment of b to an element a of A by the function f. 


A f: A -> B 
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Injective function 

Definition : A function f is said to be one-to-one, or injective, if 
and only if f(x) = f(y) implies x = y for all x, y in the domain of 
f. A function is said to be an injection if it is one-to-one. 

Alternative: A function is one-to-one if and only if f(x) f(y), 
whenever x ^ y. This is the contrapositive of the definition. 



Not injective function Injective function 
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Bijective functions 

Example 1: 

• Let A = {1,2,3} and B = {a,b,c} 

- Define f as 

• 1 —> c 

• 2 —> a 

• 3 —> b 

• Is f a bijection? 
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Bijective functions 

Example 1: 

• Let A = {1,2,3} and B = {a,b,c} 

- Define f as 

• 1 —> c 

• 2 —> a 

• 3 —> b 

• Is f a bijection? 

• Yes. It is both one-to-one and onto. 


M. Hauskrecht 
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Bijective functions 

Example 2: 

• Define g : W —> W (whole numbers), where 
g(n) = L n/2 J (floor function). 

• o -> L 0/2 J = L o J = o 

• 1 -> L 1/2 J = L 1/2 J = 0 

• 2 -> L 2/2 J = L 1 J = 1 

• 3 -> L 3/2 J = L 3/2 J = 1 

• 

• Is g a bijection? 
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Bijective functions 

Example 2: 

• Define g : W —> W (whole numbers), where 
g(n) = L n/2 J (floor function). 

• o -> L 0/2 J = L o J = o 

• 1 -> L 1/2 J = L 1/2 J = 0 

• 2 -> L 2/2 J = L 1 J = 1 

• 3 -> L 3/2 J = L 3/2 J = 1 

• Is g a bijection? 

- No. g is onto but not 1-1 (g(0) = g(l) = 0 however 0^1. 


M. Hauskrecht 
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Bijective functions 

Theorem: Let f be a function f: A ->A from a set A to itself, 
where A is finite. Then f is one-to-one if and only if f is onto. 

Assume 

-> A is finite and f is one-to-one (injective) 

• Is f an onto function (surjection)? 
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Bijective functions 

Theorem: Let f be a function f: A ->A from a set A to itself, 

where A is finite. Then f is one-to-one if and only if f is onto. 

Proof: 

-> A is finite and f is one-to-one (injective) 

• Is f an onto function (surjection)? 

• Yes. Every element points to exactly one element. Injection 
assures they are different. So we have |A| different elements A 
points to. Since f: A A the co-domain is covered thus the 
function is also a surjection (and a bijection) 

<- A is finite and f is an onto function 

• Is the function one-to-one? 
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Bijective functions 

Theorem: Let f be a function f: A ->A from a set A to itself, 

where A is finite. Then f is one-to-one if and only if f is onto. 

Proof: 

-> A is finite and f is one-to-one (injective) 

• Is f an onto function (surjection)? 

• Yes. Every element points to exactly one element. Injection 
assures they are different. So we have |A| different elements A 
points to. Since f: A A the co-domain is covered thus the 
function is also a surjection (and a bijection) 

4- A is finite and f is an onto function 

• Is the function one-to-one? 

• Yes. Every element maps to exactly one element and all 
elements in A are covered. Thus the mapping must be one-to- 
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Bijective functions 

Theorem. Let f be a function from a set A to itself, where A is 
finite. Then f is one-to-one if and only if f is onto. 

Please note the above is not true when A is an infinite set. 

• Example: 

- f : Z —> Z, where f(z) = 2 * z. 

- f is one-to-one but not onto. 

• 1 —» 2 

• 2 —> 4 

• 3 —> 6 

- 3 has no pre-image. 


M. Hauskrecht 
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Functions on real numbers 

Definition : Let fl and f2 be functions from A to R (reals). Then 
fl + f2 and f 1 * f2 are also functions from A to R defined by 

• (fl + f2)(x) = fl(x) + f2(x) 

• (fl * f2)(x) = fl(x) * f2(x). 

Examples: 

• Assume 

• f 1 (x) — x - 1 

• f2(x) = x 3 + 1 

then 

• (fl + f2)(x) = x 3 + x 

• (fl * f2)(x) = x 4 - x 3 + x - 1. 
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Increasing and decreasing functions 

Definition : A function f whose domain and codomain are subsets 
of real numbers is strictly increasing if f(x) > f(y) whenever x > 
y and x and y are in the domain of f. Similarly, f is called 
strictly decreasing if f(x) < f(y) whenever x > y and x and y are 
in the domain of f. 

Example: 

• Let g : R —» R, where g(x) = 2x - 1. Is it increasing ? 
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Increasing and decreasing functions 

Definition : A function f whose domain and codomain are subsets 
of real numbers is strictly increasing if f(x) > f(y) whenever x > 
y and x and y are in the domain of f. Similarly, f is called 
strictly decreasing if f(x) < f(y) whenever x > y and x and y are 
in the domain of f. 

Example: 

• Let g : R —» R, where g(x) = 2x - 1. Is it increasing ? 

• Proof. 

For x>y holds 2x > 2y and subsequently 2x-l > 2y-l 

Thus g is strictly increasing. 
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Increasing and decreasing functions 

Definition : A function f whose domain and codomain are subsets 
of real numbers is strictly increasing if f(x) > f(y) whenever x > 
y and x and y are in the domain of f. Similarly, f is called 
strictly decreasing if f(x) < f(y) whenever x > y and x and y are 
in the domain of f. 

Note: Strictly increasing and strictly decreasing functions are one- 
to-one. 

Why? 
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Increasing and decreasing functions 

Definition : A function f whose domain and codomain are subsets 
of real numbers is strictly increasing if f(x) > f(y) whenever x > 
y and x and y are in the domain of f. Similarly, f is called 
strictly decreasing if f(x) < f(y) whenever x > y and x and y are 
in the domain of f. 

Note: Strictly increasing and strictly decreasing functions are one- 
to-one. 

Why? 

One-to-one function: A function is one-to-one if and only if f(x) 

f(y), whenever x ^ y. 
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Identity function 


Definition: Let A be a set. The identity function on A is the 

function i A : A —> A where i A (x) = x. 


Example: 


• Let A = {1,2,3} 


Then: 


* i A (D = ? 
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Identity function 


Definition: Let A be a set. The identity function on A is the 

function i A : A —> A where i A (x) = x. 


Example: 


• Let A = {1,2,3} 


Then: 


* i A d)=l 


<N 

II 

CN 

• 


* i A (3) = 3. 
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Inverse functions 


Definition: Let f be a biiection from set A to set B. The inverse 
function of f is the function that assigns to an element b from B 
the unique element a in A such that f(a) = b. The inverse 
function of f is denoted by f' 1 . Hence, f 1 (b) = a, when f(a) = b. 

If the inverse function of f exists, f is called invertible. 

A f: A ->B 

T3 A f _1 : B -> A 
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f is bijective Inverse of f 
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Inverse functions 


Note: if f is not a bijection then it is 
inverse function of f. Why? 

not possible to define the 

Assume f is not one-to-one: 



Inverse is not a function. One element of B is mapped to two 
different elements. 

A f: A ->B B 

A f _1 : B -> A 

— 

/ • \ / \ / 
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‘Inverse’ 
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Inverse functions 


Note: if f is not a bijection then it is not possible to define the 
inverse function of f. Why? 

Assume f is not onto: 

Inverse is not a function. One element of B is not assigned any 
value in B. 
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Inverse functions 


Example 1: 

• Let A = {1,2,3} and i A be the identity function 


i A (D = l 

i A - 1 d)=l 

i A (2)= 2 

<N 

II 

CnT 

CO 

II 

CO 

• 

V 1 (3) = 3 

• Therefore, the inverse function of i A is i A . 
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Inverse functions 


Example 2: 

• Let g : R —» R, where g(x) = 2x - 1. 

• What is the inverse function g' 1 ? 
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Inverse functions 


Example 2: 


• Let g : R — > R, where g(x) = 2x - 1. 


• What is the inverse function g' 1 ? 


Approach to determine the inverse: 


y = 2x - 1 => y + 1 = 2x 


=> (y+l)/2 = x 


• Define g _1 (y) = x= (y+l)/2 


Test the correctness of inverse: 


* g(3) =.. 
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Inverse functions 


Example 2: 

• Let g : R — > R, where g(x) = 2x - 1 . 

• What is the inverse function g' 1 ? 

Approach to determine the inverse: 

y = 2x - 1 => y + 1 = 2x 
=> (y+l)/2 = x 

• Define g _1 (y) = x= (y+l)/2 

Test the correctness of inverse: 

• g(3) = 2*3 - 1 = 5 

* g 1 (5) = 
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Inverse functions 


Example 2: 

• Let g : R — > R, where g(x) = 2x - 1. 

• What is the inverse function g' 1 ? 

Approach to determine the inverse: 

y = 2x - 1 => y + 1 = 2x 
=> (y+l)/2 = x 

• Define g _1 (y) = x= (y+l)/2 

Test the correctness of inverse: 

• g(3) = 2*3 - 1 = 5 

• g 1 (5) = (5+l)/2 = 3 

• g(10) = 
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Inverse functions 


Example 2: 

• Let g : R — > R, where g(x) = 2x - 1. 

• What is the inverse function g' 1 ? 

Approach to determine the inverse: 

y = 2x - 1 => y + 1 = 2x 
=> (y+l)/2 = x 

• Define g _1 (y) = x= (y+l)/2 

Test the correctness of inverse: 

• g(3) = 2*3 - 1 = 5 

• g 1 (5) = (5+l)/2 = 3 

• g(10) = 2*10 - 1 = 19 

• g' 1 (19) = 
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Inverse functions 


Example 2: 

• Let g : R —> R, where g(x) = 2x - 1. 

• What is the inverse function g' 1 ? 

Approach to determine the inverse: 

y = 2x - 1 => y + 1 = 2x 
=> (y+l)/2 = x 

• Define g _1 (y) = x= (y+l)/2 

Test the correctness of inverse: 

• g(3) = 2*3 - 1 = 5 

• g 1 (5) = (5+l)/2 = 3 

• g(10) = 2*10 - 1 = 19 

• g" 1 (19) = (19+l)/2 = 10. 
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Composition of functions 


Example 1: 

• Let A = {1,2,3} and B = {a,b,c,d} 


g : A — » A, 

f: A^B 


1 — > 3 

1 ->b 


2 — > 1 

2 —^ 3 . 


3^2 

3 — > d 


fOg:A^B: 


• l-> 
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Composition of functions 


Example 1: 

• Let A = {1,2,3} and B = {a,b,c,d} 


g : A —> A, 

f: A^B 


1 —> 3 

1 ->b 


2 —> 1 

2 —^ <% 


3^2 

3 —> d 


f O g : A —> B: 

• 1 —> d 

• 2 —» 
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Composition of functions 


Example 1: 

• Let A = {1,2,3} and B = {a,b,c,d} 


g : A — > A, 

f: A — > B 


1 —> 3 

1 ->b 


2 —> 1 

2 —^ 3 . 


3 —> 2 

3 —> d 


fOg:A^B: 

• l->d 

• 2^b 

• 3 —» 
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Composition of functions 


Example 1: 

• Let A = {1,2,3} and B = {a,b,c,d} 


g : A —> A, 

f: A —> B 


1 —> 3 

1 ->b 


2 —> 1 

2 —^ <% 


3^2 

3 —> d 


f O g : A —> B: 

• 1 —> d 

• 2^b 

• 3 —> a 
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Composition of functions 


Example 2: 


• Let f and g be two functions from Z to Z, where 


• f(x) = 2x and g(x) = x 2 . 


• f O g : Z —> Z 


• (fOg)(x) = f(g(x)) 


f(x 2 ) 


2(x 2 ) 


• gOf:Z->Z 


• (g O f)(x) = ? 
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Composition of functions 


Example 2: 


• Let f and g be two functions from Z to Z, where 

• f(x) = 2x and g(x) = x 2 . 

• f O g : Z —> Z 

• (f O g)(x) = f(g(x)) 

f(x 2 ) 

2(x 2 ) 

• gOf:Z->Z 

• (g O f)(x) = g(f(x)) 

g(2x) Note that the order of 
= (2x) ^ the function composition matters 

= 4x 2 
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Composition of functions 

Example 3: 

• (f O f 4 )(x) = x and (f 1 O f)(x) = x, for all x. 

• Let f : R —> R, where f(x) = 2x - 1 and f _1 (x) = (x+l)/2. 

• (f Of' 1 )(x)= f(f-'(x)) 

f( (x+l)/2 ) 

= 2( (x+l)/2 ) - 1 

(x+1) - 1 

= X 
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Composition of functions 

Example 3: 

• (f O f 1 )(x) = x and (f 1 O f)(x) = x, for all x. 

• Let f : R —> R, where f(x) = 2x - 1 and f _1 (x) = (x+l)/2. 

• (fOf- 1 )(x)= f(f - 1 (x)) 

f( (x+l)/2) 

= 2( (x+l)/2 ) - 1 

(x+1) - 1 

= X 

• (f _1 O f)(x) = f- ] (f(x)) 

= f - 1 ( 2x - 1) 

= (2x)/2 

= x 

CS 441 Discrete mathematics for CS Hauskrecht 


Some functions 

Definitions : 

• The floor function assigns a real number x the largest integer 
that is less than or equal to x. The floor function is denoted by 

L x J. 

• The ceiling function assigns to the real number x the smallest 
integer that is greater than or equal to x. The ceiling function is 
denoted by T x ]. 

Other important functions: 

• Factorials: n! = n(n-l) such that 1! = 1 
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Chapter three: 
Functions and their graphs 
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1 Functions 


In this Chapter we will cover various aspects of functions. We will look at the definition of 
a function, the domain and range of a function, what we mean by specifying the domain 
of a function and absolute value function. 


1.1 What is a function? 

1.1.1 Definition of a function 

A function / from a set of elements A to a set of elements Y is a rule that 
assigns to each element x in X exactly one element y in Y. 


One way to demonstrate the meaning of this definition is by using arrow diagrams. 



/ : X —> Y is a function. Every element 
in X has associated with it exactly one 
element of Y. 


g : X —> Y is not a function. The ele¬ 
ment 1 in set X is assigned two elements, 
5 and 6 in set Y. 


A function can also be described as a set of ordered pairs (x, y ) such that for any x-value in 
the set, there is only one y-value. This means that there cannot be any repeated x-values 
with different y- values. 

The examples above can be described by the following sets of ordered pairs. 


F = {(1,5),(3,3),(2,3),(4,2)} is a func- G = {(1,5),(4,2),(2,3),(3,3),(1,6)} is not 
tion. a function. 

The definition we have given is a general one. While in the examples we have used numbers 
as elements of X and Y, there is no reason why this must be so. However, in these notes 
we will only consider functions where A' and Y are subsets of the real numbers. 

In this setting, we often describe a function using the rule, y = f(x), and create a graph 
of that function by plotting the ordered pairs ( x,f(x )) on the Cartesian Plane. This 
graphical representation allows us to use a test to decide whether or not we have the 
graph of a function: The Vertical Line Test. 
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1.1.2 The Vertical Line Test 


The Vertical Line Test states that if it is not possible to draw a vertical line through a 
graph so that it cuts the graph in more than one point, then the graph is a function. 




This is the graph of a function. All possi- This is not the graph of a function. The 

blc vertical lines will cut this graph only vertical line we have drawn cuts the 

once. graph twice. 


1.1.3 Domain of a function 

For a function / : X —> Y the domain of / is the set X. 

This also corresponds to the set of x -values when we describe a function as a set of ordered 
pairs ( x,y ). 

If only the rule y = f(x) is given, then the domain is taken to be the set of all real x for 
which the function is defined. For example, y = \fx has domain; all real x > 0. This is 
sometimes referred to as the natural domain of the function. 


1.1.4 Range of a function 

For a function / : X —> Y the range of / is the set of y-values such that y = f(x) for 
some x in A'. 

This corresponds to the set of y-values when we describe a function as a set of ordered 
pairs (x,y). The function y = ^fx has range; all real y > 0. 

Example 

a. State the domain and range of y = \/x + 4. 

b. Sketch, showing significant features, the graph of y = \jx + 4. 
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Solution 


a. The domain of y — \Jx + 4 is all real x > —4. We know that square root functions are 
only defined for positive numbers so we require that x + 4 > 0, ie x > —4. We also 
know that the square root functions are always positive so the range of y — \Jx + 4 is 
all real y > 0. 

b. 



The graph of y — y/x + 4. 


Example 

a. State the equation of the parabola sketched below, which has vertex (3, —3). 



b. Find the domain and range of this function. 

Solution 

a. The equation of the parabola is y = x2 ~ 6x . 

b. The domain of this parabola is all real x. The range is all real y > —3. 

Example 

Sketch x 2 + y 2 = 16 and explain why it is not the graph of a function. 

Solution 

x 2 + y 2 = 16 is not a function as it fails the vertical line test. For example, when x = 0 

y = ± 4. 
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c. f(x 2 ) = 3(x 2 ) — (a: 2 ) 2 = 3a: 2 — x 4 


d. 

/(2 + h) - /(2) = (3(2 + h) - (2 + fr) 2 ) - (3(2) - (2) 2 ) 

h h 

6 + 3 h-(h 2 + 4 h + 4) - 2 
h 

-/z 2 - h 
h 

= -h- 1 

Example 

Sketch the graph of the function f{x) = (x — l) 2 + 1 and show that f(p) = /(2 — p). 
Illustrate this result on your graph by choosing one value of p. 

Solution 



The graph of f(x) = [x — l) 2 + 1. 

/(2-p) = ((2 — p) — l) 2 + 1 
= (1-P) 2 + 1 
= (P- 1) 2 + 1 
= /(p) 
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The sketch illustrates the relationship f(p) = f(2 — p) for p = —1. If p = —1 then 
2-p = 2 - (-1) = 3, and /(-l) = /(3). 

1.2 Specifying or restricting the domain of a function 

We sometimes give the rule y — f(x) along with the domain of definition. This domain 
may not necessarily be the natural domain. For example, if we have the function 

y = x 2 for 0 < x < 2 

then the domain is given as 0 < x < 2. The natural domain has been restricted to the 
subinterval 0 < x < 2. 

Consequently, the range of this function is all real y where 0 < y < 4. We can best 
illustrate this by sketching the graph. 



The graph of y = x 2 for 0 < x < 2. 
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1.3 The absolute value function 


Before we define the absolute value function we will review the definition of the absolute 
value of a number. 

The Absolute value of a number x is written \x\ and is defined as 

|x| = x if x > 0 or |x| = —x if x < 0. 

That is, |4| = 4 since 4 is positive, but j — 2| = 2 since —2 is negative. 

We can also think of |x| geometrically as the distance of x from 0 on the number line. 


<— 1 - 21 = 2 —> 

<- 

141=4 

-> 

-2 ' 0 

4 


More generally, \x — a\ can be thought of as the distance of x from a on the numberline. 

<— \a-x I = lx - a\ —> 


a x 


Note that \a — x\ = |x — a\. 

The absolute value function is written as y — |x|. 

We define this function as 

! +x if x > 0 
—x if x < 0 

From this definition we can graph the function by taking each part separately. The graph 
of y — |x| is given below. 



The graph of y = \x 
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Example 

Sketch the graph of y — \x — 2 

Solution 


For y — \x — 2\ we have 


V = 


+ (x — 2) when x — 2 > 0 or x > 2 
— (x — 2) when x — 2 < 0 or x < 2 


That is, 

[ x — 2 for x > 2 
2 /= < 

—x + 2 for x < 2 

Hence we can draw the graph in two parts. 



The graph of y — \ x — 21. 

We could have sketched this graph by first of all sketching the graph of y = x — 2 and 
then reflecting the negative part in the x-axis. We will use this fact to sketch graphs of 
this type in Chapter 2. 


1.4 Exercises 

1. a. State the domain and range of /(x) = y/9 — x 2 . 
b. Sketch the graph of y — a/9 — x 2 . 

-J- fij — wix) 

2. Given -/(x) = x 2 + 5, find, in simplest form, -—- h ^ 0. 

3. Sketch the following functions stating the domain and range of each: 
a . y — \Jx — 1 
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b. y = \2x\ 
c - V = 

(1. ij — 2.r| — 1. 

4. a. Find the perpendicular distance from (0, 0) to the line x + y + k = 0 

b. If the line x + y + k — 0 cuts the circle x 2 + y 2 = 4 in two distinct points, find the 
restrictions on k. 

5. Sketch the following, showing their important features. 



b. y 2 = x 2 . 

6. Explain the meanings of function, domain and range. Discuss whether or not y 2 = x 3 
is a function. 

7. Sketch the following relations, showing all intercepts and features. State which ones 
are functions giving their domain and range. 

a. y = — y/A — x 2 

b. |x| — |2/| = 0 

c. y = x 3 

d ■ y = wr x ^° 

e. \y\ = x. 

8. If A(x) = x 2 + 2 + 4^, x ^ 0, prove that A(p) = A(^) for all p ^ 0. 

9. Write down the values of x which are not in the domain of the following functions: 
a. f(x) = y/x 2 — Ax 

b- g(x) = 

10. If <f>{x) = log fi nc l bi simplest form: 

a. (f)(3) + 0(4) + 0(5) 

b. 0(3) + 0(4) + 0(5) H-h 0(n) 

11. a. If y = x 2 + 2x and x = (z — 2) 2 , End y when z — 3. 

b. Given L(x) — 2x + 1 and M(x) = x 2 — x, find 

i L(M(x )) 

ii M(L(x )) 
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12. Using the sketches, find the value(s) of the constants in the given equations: 




13. a. Define |a|, the absolute value of o, where a is real, 
b. Sketch the relation |x| + \y\ = 1. 

14. Given that S(n) = 2 " x , find an expression for S(n — 1). 
Hence show that S(n) — S(n — 1) = ( 2n - 1 ) 1 ( 2 n + i) • 
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2 More about functions 


In this Chapter we will look at the effects of stretching, shifting and reflecting the basic 
functions, y = x 2 , y = x 3 , y = -, y — |x|, y — a x , x 2 + y 2 = r 2 . We will introduce the 
concepts of even and odd functions, increasing and decreasing functions and will solve 
equations using graphs. 


2.1 Modifying functions by shifting 

2.1.1 Vertical shift 

We can draw the graph of y — f(x) + k from the graph of y = f(x) as the addition of 
the constant k produces a vertical shift. That is, adding a constant to a function moves 
the graph up k units if k > 0 or down k units if k < 0. For example, we can sketch the 
function y = x 2 — 3 from our knowledge of y = x 2 by shifting the graph of y = x 2 down 
by 3 units. That is, if /(x) = x 2 then f(x) — 3 = x 2 — 3. 



We can also write y = f(x) — 3 as y + 3 = /(x), so replacing y by y + 3 in y — f(x) also 
shifts the graph down by 3 units. 


2.1.2 Horizontal shift 

We can draw the graph of y = f(x — a) if we know the graph of y = f(x) as placing the 
constant a inside the brackets produces a horizontal shift. If we replace x by x — a inside 
the function then the graph will shift to the left by a units if a < 0 and to the right by a 
units if a > 0. 
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For example we can sketch the graph of y — from our knowledge of y = ^ by shifting 
this graph to the right by 2 units. That is, if f(x) = - then f(x — 2) = ■ 



Note that the function y = is not defined at x = 2. The point (1,1) has been shifted 
to (1,3). 


2.2 Modifying functions by stretching 

We can sketch the graph of a function y = bf(x ) (b > 0) if we know the graph of y — f(x) 
as multiplying by the constant b will have the effect of stretching the graph in the y- 
direction by a factor of b. That is, multiplying f(x) by b will change all of the ^/-values 
proportionally. 

For example, we can sketch y = 2x 2 from our knowledge of y = x 2 as follows: 




The graph of y — 2x 2 . Note, all the y- 
values have been multiplied by 2, but the 
x-values are unchanged. 


We can sketch the graph of y = from our knowledge of y = x 2 as follows: 
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2.3 Modifying functions by 

2.3.1 Reflection in the x-axis 



The graph of y — |x 2 . Note, all the 
y -values have been multiplied by but 
the x-values are unchanged. 


reflections 


We can sketch the function y = —f(x) if we know the graph of y = f{x), as a minus sign 
in front of f(x) has the effect of reflecting the whole graph in the x-axis. (Think of the 
x-axis as a mirror.) For example, we can sketch y = —|x| from our knowledge of y — |x|. 




The graph of y — — |x|. It is the reflec¬ 
tion of y — |x| in the x-axis. 


2.3.2 Reflection in the y- axis 


We can sketch the graph of y — /(—x) if we know the graph of y — /(x) as the graph of 
y = /(—x) is the reflection of y — /(x) in the y- axis. 
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For example, we can sketch y = 3 x from our knowledge of y = 3 X . 




The graph of y = 3 x . It is the reflection 
of y = 3 X in the y- axis. 


2.4 Other effects 

We can sketch the graph of y — \f(x)\ if we know the graph of y = f(x) as the effect of the 
absolute value is to reflect all of the negative values of f(x) in the rr-axis. For example, 
we can sketch the graph of y — \x 2 — 3| from our knowledge of the graph of y = x 2 — 3. 



values of y = x 2 — 3 have been reflected 
in the rc-axis. 


2.5 Combining effects 

We can use all the above techniques to graph more complex functions. For example, we 
can sketch the graph ofy = 2 — (x + 1) 2 from the graph of y = x 2 provided we can analyse 
the combined effects of the modifications. Replacing x by x + 1 (or x — (—1)) moves the 
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graph to the left by 1 unit. The effect of the — sign in front of the brackets turns the 
graph up side down. The effect of adding 2 moves the graph up 2 units. We can illustrate 
these effects in the following diagrams. 




The 
of y 
left. 


graph of y = (x + l) 2 . The graph 
= x 2 has been shifted 1 unit to the 



The graph of y — —{x + l) 2 . The graph 
of y = (x + l) 2 has been reflected in the 
a;-axis. 



The graph of y = 2— (x +1) 2 . The graph 
of y = — (x + l) 2 has been shifted up by 
2 units. 


Similarly, we can sketch the graph of (x — h) 2 + (y — k) 2 = r 2 from the graph of x 2 + y 2 = r 2 . 
Replacing x by x — h shifts the graph sideways h units. Replacing y by y — k shifts the 
graph up or down k units. (We remarked before that y — f(x) + k could be written as 
y-k = f(x).) 


For example, we can use the graph of the circle of radius 3, x 2 + y 2 = 9, to sketch the 
graph of (x — 2) 2 + {y + 4) 2 = 9. 
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The graph of x 2 + y 2 = 9. 

This is a circle centre (0,0), radius 3. 



The graph of (x — 2) 2 + (y + 4) 2 = 9. 
This is a circle centre (2, —4), radius 3. 


Replacing x by x — 2 has the effect of shifting the graph of x 2 + y 2 = 9 two units to the 
right. Replacing y by y + 4 shifts it down 4 units. 

2.6 Graphing by addition of ordinates 

We can sketch the graph of functions such as y = |x| + \x — 2\ by drawing the graphs of 
both y = |a:| and y = \x — 2\ on the same axes then adding the corresponding y- values. 
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The graph of y = |x| + |x — 2 


At each point of x the ^-values of y — x and y — \x — 2| have been added. This allows 
us to sketch the graph oi y — \x\ + \x — 2|. 

This technique for sketching graphs is very useful for sketching the graph of the sum of 
two trigonometric functions. 


2.7 Using graphs to solve equations 


We can solve equations of the form /(x) = k by sketching y = /(x) and the horizontal line 
y = k on the same axes. The solution to the equation /(x) = k is found by determining 
the x-values of any points of intersection of the two graphs. 
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For example, to solve \x — 3| = 2 we sketch y — \x — 3| and y — 2 on the same axes. 



The x -values of the points of intersection are 1 and 5. Therefore \x — 3| = 2 when x — 1 
or x = 5. 

Example 

The graph of y — f(x) is sketched below. 



For what values of k does the equation f(x) = k have 

1. 1 solution 

2. 2 solutions 

3. 3 solutions? 

Solution 

If we draw a horizontal line y — k across the graph y = f(x), it will intersect once when 
k > 0 or k < —4, twice when k = 0 or k = —4 and three times when —4 < k < 0. 
Therefore the equation f(x) = k will have 
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1. 1 solution if k > 0 or k < —4 

2. 2 solutions if k — 0 or k — —4 

3. 3 solutions if —4 < k < 0. 

2.8 Exercises 

1. Sketch the following: 

a. y = x 2 b. y = |x 2 c. y — —x 2 d. y = (x + l) 2 

2. Sketch the following: 

a. y=\ b. y=^ c. y==£ d. y = ^ + 2 

3. Sketch the following: 

a. y = x 3 b. y = \x 3 — 2| c. y = 3 — (x — l) 3 

4. Sketch the following: 

a. y — |x| b. y = 2\x — 2j c. y = 4 — |x| 

5. Sketch the following: 

a. x 2 + y 2 = 16 b. x 2 + (?/+ 2) 2 = 16 c. (x — l) 2 + (y — 3) 2 = 16 

6. Sketch the following: 



x — 1 1 

7. Show that -=- b 1. 

x — 2 x — 2 

x — 1 

Hence sketch the graph of y = -. 

x — 2 

8. Sketch y = *±i 

9. Graph the following relations in the given interval: 

a. y = |x| + x + 1 for — 2 < x < 2 [Hint: Sketch by adding ordinates] 

b. y — |x| + |x — 1| for —2 < x < 3 

c. y = 2 X + 2~ x for —2 < x < 2 

d. |x — y\ = 1 for —1 < x < 3. 

10. Sketch the function /(x) = |x 2 — 1| — 1. 
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11. Given y = f(x) as sketched below, sketch 
a. y = 2 f(x) 
b ■ y = ~f{x) 
c- V = f(~x) 

d . y = fix) + 4 

e. y = f(x~ 3) 

f. y = f(x+ 1) - 2 
g- y = 3 - 2/(x - 3) 
h- 2/ = |/(x)| 



12. By sketching graphs solve the following equations: 

a. |2x| = 4 

b. -4— = —1 

x—2 

S 2 
C. X = X 

d. x 2 = 4 

X 

13. Solve \x — 2| — 3. 

a. algebraically 

b. geometrically. 

14. The parabolas y — {x — l) 2 and y — (x — 3) 2 intersect at a point P. Find the 
coordinates of P. 

15. Sketch the circle x 2 + y 2 — 2x — 14 y + 25 = 0. [Hint: Complete the squares.] Find 
the values of k, so that the line y = k intersects the circle in two distinct points. 

16. Solve = i 5 using a graph. 

17. Find all real numbers x for which \x — 2| = \x + 2|. 

18. Given that Q(p) = p 2 — p, find possible values of n if Q(n) = 2. 

19. Solve \x — 4| = 2x. 

a. algebraically 

b. geometrically. 
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2.9 Even and odd functions 


Definition: 

A function, y = fix), is even if f(x) = f(—x) for all x in the domain of /. 

Geometrically, an even function is symmetrical about the y -axis (it has line symmetry). 

The function f(x ) = x 2 is an even function as f(—x) = (—x) 2 = x 2 = f(x) for all values 
of x. We illustrate this on the following graph. 



The graph of y = x 2 . 


Definition: 

A function, y = f(x), is odd if f(—x) = —f(x) for all x in the domain of /. 

Geometrically, an odd function is symmetrical about the origin (it has rotational symme¬ 
try). 

The function f(x) = x is an odd function as f(—x) = —x = —f(x) for all values of x. 
This is illustrated on the following graph. 



The graph of y — x. 
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Example 


Decide whether the following functions are even, odd or neither. 
1. /(x) = 3x 2 — 4 
2- g(x) = i 
3. /(x) = x 3 — x 2 . 


Solution 

1 . 


/(—x) = 3(— x) 2 — 4 = 3x 2 — 4 = /(x) 


The function /(x) = 3x 2 — 4 is even. 


gi ~ x) = 2Fij 

Therefore, the function g is odd. 


1 

-2x 


1 

2x 


-g(x) 


/(—x) = (—x) 3 — (—x) 2 = —x 3 — x 2 

This function is neither even (since — x 3 — x 2 ^ x 3 — x 2 ) nor odd (since —x 3 — x 2 ^ 
— (x 3 — x 2 )). 


Example 

Sketched below is part of the graph of y — f(x). 



Complete the graph if y — f(x) is 

1. odd 

2. even. 
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Solution 



y — f{x) is an odd function. 



y — f(x) is an even function. 


2.10 Increasing and decreasing functions 

Here we will introduce the concepts of increasing and decreasing functions. In Chapter 5 
we will relate these concepts to the derivative of a function. 

Definition: 


A function is increasing on an interval I, if for all a and b in / such that a < b, 

/(«) < / 0 )• 


The function y = 2 X is an example of a function that is increasing over its domain. The 
function y = x 2 is increasing for all real x > 0. 
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increasing on the interval x > 0. 


Notice that when a function is increasing it has a positive slope. 

Definition: 

A graph is decreasing on an interval /, if for all a and b in I such that a < b, 

/(«) > / 0 )• 


The function y — 2 x is decreasing over its domain. The function y = x 2 is decreasing on 
the interval x < 0. 




The graph of y — 2 x . This function is 
decreasing for all real x. 


The graph of y = x 2 . This function is 
decreasing on the interval x < 0. 


Notice that if a function is decreasing then it has negative slope. 


2.11 Exercises 

1. Given the graph below of y — f(x ): 

a. State the domain and range. 

b. Where is the graph 
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i increasing? 

ii decreasing? 

c. if k is a constant, find the values of k such that f(x) = k has 

i no solutions 

ii 1 solution 

iii 2 solutions 

iv 3 solutions 

v 4 solutions. 

d. Is y — f{x ) even, odd or neither? 



2. Complete the following functions if they are defined to be (a) even (b) odd. 




y = f(x) y = g(x ) 


3. Determine whether the following functions are odd, even or neither. 


a. 

y = 

x 4 + 2 

b. 

y = 

V4 — x 2 

c. 

y = 

2 X 

d. 

y = 

x 3 + 3x 

e. 

y = 

X 

f. 

y = 

1 

g- 

y = 

l 

h. 

y = 

x 

X 2 

to 

1 

x 2 + 4 

x 3 + 3 

i. 

y = 

2 X + 2~ x 

j- 

y = 

\x - 1 + 

\x + 1 







4. Given y = f(x) is even and y = g(x) is odd, prove 

a. if h(x) = f(x) ■ g(x ) then h(x) is odd 

b. if h(x) = ( g{x )) 2 then h(x) is even 
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fix) 

c. if h(x) = , g(x) ^ 0, then h(x) is odd 

9{x) 

d. if h(x) = f{x) ■ ( g(x )) 2 then h(x) is even. 


5. Consider the set of all odd functions which are defined at x — 0. Can you prove that 
for every odd function in this set /(0) = 0? If not, give a counter-example. 


76 



3 Piecewise functions and solving inequalities 


In this Chapter we will discuss functions that are defined piecewise (sometimes called 
piecemeal functions) and look at solving inequalities using both algebraic and graphical 
techniques. 


3.1 Piecewise functions 

3.1.1 Restricting the domain 

In Chapter 1 we saw how functions could be defined on a subinterval of their natural 
domain. This is frequently called restricting the domain of the function. In this Chapter 
we will extend this idea to define functions piecewise. 

Sketch the graph of y = 1 — x 2 for x > 0. 



The graph of y = 1 — x 2 for x > 0. 


Sketch the graph of y = 1 — x for x < 0. 



The graph of y = 1 — x for x < 0. 
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We can now put these pieces together to define a function of the form 

! 1 — x 2 for x > 0 
1 — x for x < 0 

We say that this function is defined piecewise. First note that it is a function; each value 
of x in the domain is assigned exactly one value of y. This is easy to see if we graph the 
function and use the vertical line test. We graph this function by graphing each piece of 
it in turn. 



The graph shows that / defined in this way is a function. The two pieces of y — f(x) 
meet so / is a continuous function. 


The absolute value function 

{ x for x > 0 
—x for x < 0 

is another example of a piecewise function. 


Example 


Sketch the function 

{ x 2 + 1 for x > 0 
2 for x < 0 
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Solution 



This function is not continuous at x — 0 as the two branches of the graph do not meet. 

Notice that we have put an open square (or circle) around the point (0, 2) and a solid 
square (or circle) around the point (0,1). This is to make it absolutely clear that /(0) = 1 
and not 2. When defining a function piecewise, we must be extremely careful to assign 
to each x exactly one value of y. 


3.2 Exercises 


1. For the function 


1 1 — x 2 for x > 0 
1 — x for x < 0 


evaluate 


a. 2/(-l) + /(2) 

b. /(a 2 ) 


2. For the function given in 1, solve f(x) = 2. 


3. Below is the graph of y — g(x). Write down the rules which define g(x) given that 
its pieces are hyperbolic, circular and linear. 
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7. McMaths burgers are to modernise their logo as shown below. 



Write down a piecewise function that represents this function using (a) 4 (b) 3 (c) 2 
pieces (i.e. rules that define the function). 


8. a. The following piecewise function is of the form 

{ ax 2 + b for 0 < x < 2 
cx + d for x > 2 



Determine the values of a, b, c and d. 

b. Complete the graph so that f(x) is an odd function defined for all real x, x ^ 0. 

c. Write down the equations that now define f(x), x ^ 0. 
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3.3 Inequalities 


We can solve inequalities using both algebraic and graphical methods. Sometimes it is 
easier to use an algebraic method and sometimes a graphical one. For the following 
examples we will use both, as this allows us to make the connections between the algebra 
and the graphs. 


Algebraic method 

1. Solve 3 — 2x > 1. 

This is a (2 Unit) linear inequality. 
Remember to reverse the inequality 
sign when multiplying or dividing 
by a negative number. 

3-2x > 1 
-2x > -2 
x < 1 


2. Solve x 2 — 4x + 3 < 0. 

This is a (2 Unit) quadratic inequal¬ 
ity. Factorise and use a number line. 
x 2 — 4x + 3 < 0 
(x — 3)(x — 1) < 0 

The critical values are 1 and 3, 
which divide the number line into 
three intervals. We take points in 
each interval to determine the sign 
of the inequality; eg use x — 0, 
x = 2 and x = 4 as test values. 

positive negative j positive 

-*- 1 -a- 1 -a- 1 - 

-10 12 3 4 

Thus, the solution is 1 < x < 3. 


Graphical method 



When is the line y — 3 — 2x above or 
on the horizontal line y — 1 ? From the 
graph, we see that this is true for x < 1. 


Let y = x 2 — Ax + 3. 



When does the parabola have negative 
^/-values? OR When is the parabola un¬ 
der the x-axis? From the graph, we see 
that this happens when 1 < x < 3. 
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3. Solve —K < 1. 

x—4 — 

This is a 3 Unit inequality. There 
is a variable in the denominator. 
Remember that a denominator can 
never be zero, so in this case x/4. 
First multiply by the square of the 
denominator 

x — 4 < (x — 4) 2 , x 4 
x — 4 < x 2 — 8x + 16 
0 < x 2 — 9x + 20 

0 < (x — 4)(x — 5) 
Mark the critical values on the num¬ 
ber line and test x = 0, x = 4.5 and 
x = 6. 

positive j neg j positive 

—i-1-1-1-i ■ 1— 

0 1 2 3 4 5 6 


Therefore, x < 4 or x > 5. 


4. Solve x - 3 < 19. 

X 

Consider x — 3 = —, x + 0. 
Multiply by x we get 

x 2 — 3x = 10 
x 2 — 3x — 10 = 0 

(x-5)(x + 2) = 0 
Therefore, the critical values are 
—2, 0 and 5 which divide the num¬ 
ber line into four intervals. We can 
use x = —3, x = —1, x = 1 and 
x = 6 as test values in the inequal¬ 
ity. The points x = — 3 and x — 1 
satisfy the inequality, so the solu¬ 
tion is x < —2 or 0 < x < 5. 

(Notice that we had to include 0 as 
one of our critical values.) 



is a hyperbola with vertical asymptote 
at x = 4. To solve our inequality we 
need to find the values of x for which 
the hyperbola lies on or under the line 
y = 1. (5,1) is the point of intersection. 
So, from the graph we see that < 1 
when x < 4 or x > 5. 


Sketch y = x — 3 and then y — W Note 
that second of these functions is not de- 



For what values of x does the line lie 
under the hyperbola? From the graph, 
we see that this happens when x < —2 
or 0 < x < 5. 


83 




Example 


Sketch the graph of y — \2x — 6|. 
Hence, where possible, 
a. Solve 


1 

ii 

iii 

iv 

v 


|2x — 6 
\2x — 6 
\2x — 6 
\2x — 6 
\2x — 6 


= 2x 
> 2x 
= x + 3 
< x + 3 
= x — 3 


b. Determine the values of k for which |2x — 6| = x + k has exactly two solutions. 


Solution 


/(x) = |2x — 6| = 


2x — 6 


for x > 3 


— (2x — 6) for x < 3 



a. i Mark in the graph of y = 2x. It is parallel to one arm of the absolute value graph. 

It has one point of intersection with y = |2x — 6| = —2x + 6 (x < 3) at x = 1.5. 

ii When is the absolute value graph above the line y = 2x? From the graph, when 

x < 1.5. 
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iii y = x + 3 intersects y — \2x — 6| twice. 

To solve \2x — 6| = x + 3, take \2x — 6| =2x — 6 = x + 3 when x > 3. This gives 
us the solution x — 9. Then take \2x — 6| = — 2x + 6 = x + 3 when x < 3 which 
gives us the solution x — 1. 

iv When is the absolute value graph below the line y = x + 3? 

From the graph, 1 < x < 9. 

v y = x — 3 intersects the absolute value graph at x — 3 only. 

b. k represents the ^-intercept of the line y — x + k. When k = —3, there is one point of 
intersection. (See (a) (v) above). For k > —3, lines of the form y — x + k will have 
two points of intersection. Hence \2x — 6| = x + k will have two solutions for k > —3. 


3.4 Exercises 


1 . 


Solve 

a. x 2 < Ax 

b. < 1 

p+3 — 


9— x- 1 


> -1 


a. Sketch the graph of y — Ax(x — 3). 

b. Hence solve Ax (x — 3) < 0. 


3. a. Find the points of intersection of the graphs y = 5 — x and y — 

b. On the same set of axes, sketch the graphs of y = 5 — x and y — 

c. Using part (ii), or otherwise, write down all the values of x for which 


4 

5 — x > — 


x 


4. a. Sketch the graph of y = 2 X . 

b. Solve 2 X <\. 

c. Suppose 0 < a < b and consider the points A(a, 2“) and B(6, 2 b ) on the graph of 
y = 2 X . Find the coordinates of the midpoint M of the segment AB. 

Explain why 

2 a + 2 b a +6 

- > 2 2 

2 

5. a. Sketch the graphs of y = x and y — \x — 5| on the same diagram. 

b. Solve \x — 5| > x. 

c. For what values of m does mx = \x — 5| have exactly 

i two solutions 

ii no solutions 

6. Solve 5x 2 — 6x — 3 < |8x|. 
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4 Polynomials 


Many of the functions we have been using so far have been polynomials. In this Chapter 
we will study them in more detail. 

Definition 

A real polynomial, P(x), of degree n is an expression of the form 

P{x) = p n X n + Pn-lX ” _1 + Pn- 2 X n ~ 2 H-f p 2 X 2 + PiX + p 0 

where p n ^ 0, po, pi, ■ ■ p n are real and n is an integer > 0. 

All polynomials are defined for all real x and are continuous functions. 

We are familiar with the quadratic polynomial, Q(x) = ax 2 + bx + c where a / 0. This 
polynomial has degree 2. 

The function f(x) = y/x + x is not a polynomial as it has a power which is not an integer 
> 0 and so does not satisfy the definition. 

4.1 Graphs of polynomials and their zeros 

4.1.1 Behaviour of polynomials when |x| is large 

One piece of information that can be a great help when sketching a polynomial is the 
way it behaves for values of x when |x| is large. That is, values of x which are large in 
magnitude. 

The term of the polynomial with the highest power of x is called the leading or dominant 
term. For example, in the polynomial P(x) = x 6 — 3x 4 — 1, the term x 6 is the dominant 
term. 

When \x\ is large, the dominant term determines how the graph behaves as it is so much 
larger in magnitude than all the other terms. 

How the graph behaves for |x large depends on the power and coefficient of the dominant 
term. 

There are four possibilities which we summarise in the following diagrams: 


\ 




S 


fy 


X 


X 




\ 


1. Dominant term with even power and 2. Dominant term with even power and 
positive coefficient, eg y = x 2 . negative coefficient, eg Q(x) = —x 2 . 
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3. Dominant term with odd power and 4. Dominant term with odd power and 
positive coefficient, eg y = x 3 . negative coefficient, eg Q(x) = — x 3 . 

This gives us a good start to graphing polynomials. All we need do now is work out what 
happens in the middle. In Chapter 5 we will use calculus methods to do this. Here we 
will use our knowledge of the roots of polynomials to help complete the picture. 

4.1.2 Polynomial equations and their roots 

If, for a polynomial P(x), P(k ) = 0 then we can say 

1. x = k is a root of the equation P(x) = 0. 

2. x = k is a zero of P(x). 

3. k is an x-intercept of the graph of P(x). 

4.1.3 Zeros of the quadratic polynomial 

The quadratic polynomial equation Q(x) = ax 2 + bx + c = 0 has two roots that may be: 

1. real (rational or irrational) and distinct, 

2. real (rational or irrational) and equal, 

3. complex (not real). 

We will illustrate all of these cases with examples, and will show the relationship between 
the nature and number of zeros of Q(x) and the x-intercepts (if any) on the graph. 

1. Let Q(x) = x 2 — 4x + 3. 

We find the zeros of Q(x ) by solving the 
equation Q(x) = 0. 

x 2 — 4x + 3 = 0 

(x — l)(x — 3) = 0 
Therefore x = 1 or 3. 

The roots are rational (hence real) and 
distinct. 




2. Let Q(x) = x 2 — 4x — 3. 

Solving the equation Q(x) = 0 we get, 
x 2 — 4x — 3 = 0 



Therefore x = 2 ± \fl. 

The roots are irrational (hence real) and 
distinct. 



3. Let Q(x) = x 2 — 4x + 4. 

Solving the equation Q(x) = 0 we get, 
x 2 — 4x + 4 = 0 
(x — 2) 2 = 0 
Therefore x = 2. 

The roots are rational (hence real) and 
equal. Q(x) = 0 has a repeated or dou¬ 
ble root at x — 2. 


Notice that the graph turns at the dou¬ 
ble root x = 2. 


4. Let Q(x) = x 2 — 4x + 5. 

Solving the equation Q(x) = 0 we get, 
x 2 — 4x + 5 = 0 



Therefore x = 2 ± V - 4. 

There are no real roots. In this case the 
roots are complex. 


Notice that the graph does not intersect 
the x-axis. That is Q(x) > 0 for all real 
x. Therefore Q is positive definite. 
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We have given above four examples of quadratic polynomials to illustrate the relationship 
between the zeros of the polynomials and their graphs. 

In particular we saw that: 

i if the quadratic polynomial has two real distinct zeros, then the graph of the polyno¬ 
mial cuts the x-axis at two distinct points; 

ii if the quadratic polynomial has a real double (or repeated) zero, then the graph sits 
on the x-axis; 

iii if the quadratic polynomial has no real zeros, then the graph does not intersect the 
x-axis at all. 

So far, we have only considered quadratic polynomials where the coefficient of the x 2 
term is positive which gives us a graph which is concave up. If we consider polynomials 
Q(x) = ax 2 + bx + c where a < 0 then we will have a graph which is concave down. 

For example, the graph of Q(x) = — (x 2 — 4x + 4) is the reflection in the x-axis of the 
graph of Q(x) = x 2 — 4x + 4. (See Chapter 2.) 



The graph of Q(x) = x 2 — 4x + 4. The graph of Q(x) = — (x 2 — 4x + 4). 

4.1.4 Zeros of cubic polynomials 

A real cubic polynomial has an equation of the form 

P(x) = ax 3 + bx 2 + cx + d 

where a ^ 0, a, b, c and d are real. It has 3 zeros which may be: 

i 3 real distinct zeros; 

ii 3 real zeros, all of which are equal (3 equal zeros); 

iii 3 real zeros, 2 of which are equal; 

iv 1 real zero and 2 complex zeros. 

We will illustrate these cases with the following examples: 
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1. Let Q(x) = 3x 3 — 3x. 

Solving the equation Q(x) = 0 we get:. 

3x 3 — 3a; = 0 

3x(x — l)(x + 1) = 0 

Therefore x = —1 or 0 or 1 
The roots are real (in fact rational) and 
distinct. 


2. Let Q(x) = x 3 . 

Solving Q(x) = 0 we get that x 3 = 0. 
We can write this as (x — 0) 3 = 0. 

So, this equation has three equal real 
roots at x = 0. 


3. Let Q(x) = x 3 — x 2 . 

Solving the equation Q(x) = 0 we get, 
x 3 — x 2 = 0 

x 2 (x — 1) = 0 
Therefore x = 0 or 1. 

The roots are real with a double root at 
x = 0 and a single root at x — 1. 


The graph turns at the double root. 

4. Let Q(x) = x 3 + x. 

Solving the equation Q(x) = 0 we get, 
x 3 + x = 0 
x(x 2 + 1) = 0 
Therefore x = 0. 

There is one real root at x — 0. 
x 2 + 1 = 0 does not have any real solu¬ 
tions. 


The graph intersects the x-axis once 
only. 
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Again, in the above examples we have looked only at cubic polynomials where the coeffi¬ 
cient of the x 3 term is positive. If we consider the polynomial P(x) = — x 3 then the graph 
of this polynomial is the reflection of the graph of P{x) = a; 3 in the x-axis. 




The graph of Q(x) = x 3 . 


The graph of Q(x) = —x 3 . 


4.2 Polynomials of higher degree 

We will write down a few rules that we can use when we have a polynomial of degree > 3. 
If P(x) is a real polynomial of degree n then: 

1. P(x) = 0 has at most n real roots; 

2. if P(x) = 0 has a repeated root with an even power then the graph of P(x) turns at 
this repeated root; 

3. if P(x) = 0 has a repeated root with an odd power then the graph of P(x) has a 
horizontal point of inflection at this repeated root. 

For example, 1. tells us that if we have a quartic polynomial equation /(x) = 0. Then 
we know that f(x) = 0 has < 4 real roots. 

We can illustrate 2. by the sketching f(x) = x(x — 2) 2 (x + 1). Notice how the graph sits 
on the x-axis at x = 2. 



The graph of /(x) = x(x + l)(x — 2) 2 . 
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We illustrate 3. by sketching the graph of f(x) = x(x — 2) 3 . Notice the horizontal point 
of inflection at x = 2. 



The graph of /(x) = x(x — 2) 3 . 

4.3 Exercises 

1. Sketch the graphs of the following polynomials if y = P(x) is: 

a. x(x + l)(x — 3) 

b. x(x + 1)(3 — x) 

c. (x + l) 2 (x — 3) 

d. (x + l)(x 2 — 4x + 5) 

2. The graphs of the following quartic polynomials are sketched below. Match the graph 
with the polynomial. 

a. y = x 4 b. y = x 4 — 1 c. y = x 4 + Id. y = 1 — x 4 e. y = (x — l) 4 f. y = (x + l) 4 
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3. Sketch the graphs of the following quartic polynomials if y — C(x) is: 

a. x(x — l)(x + 2){x + 3) 

b. x(x — l)(x + 2)(3 — x) 

c. x 2 (x — l)(x — 3) 

d. (x + l) 2 (x — 3) 2 

e. (x + l) 3 (x — 3) 

f. ( x + 1) 3 (3 — x) 

g. x(x + l)(x 2 — 4x-j~ 5) 

h. x 2 (x 2 — Ax + 5). 

4. By sketching the appropriate polynomial, solve: 

a. x 2 — Ax — 12 < 0 

b. ( x + 2)(x — 3) (5 — x) >0 

c. (x + 2) 2 (5 — x) > 0 

d. (x + 2) 3 (5 - x) > 0. 

5. For what values of k will P(x) > 0 for all real x if P(x) = x 2 — Ax — 12 + A:? 

6. The diagrams show the graph of y = P(x) where P(x) = a(x — b)(x — c) d . 

In each case determine possible values for a, b, c and d. 


a. 


b. c. 





d. e. 



f. 




7. The graph of the polynomial y = f(x ) is given below. It has a local maximum and 
minimum as marked. Use the graph to answer the following questions. 

a. State the roots of f(x) — 0. 

b. What is the value of the repeated root. 

c. For what values of k does the equation f(x) = k have exactly 3 solutions. 
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d. Solve the inequality f(x) < 0. 

e. What is the least possible degree of f(x)7 

f. State the value of the constant of f(x). 

g. For what values of k is f(x) + k > 0 for all real x. 



The graph of the polynomial y = f(x) 

4.4 Factorising polynomials 

So far for the most part, we have looked at polynomials which were already factorised. In 
this section we will look at methods which will help us factorise polynomials with degree 
> 2 . 


4.4.1 Dividing polynomials 


Suppose we have two polynomials P(x) and A{x), with the degree of P(x) > the degree 
of A(x), and P(x) is divided by A(x). Then 


P(x) 

A(x) 


Q(x) 


R{x) 

~A{x)' 


where Q(x) is a polynomial called the quotient and R(x) is a polynomial called the 
remainder, with the degree of R(x) < degree of A(x). 


We can rewrite this as 

P(x) = A(x) ■ Q(x) + R(x). 


For example: If P(x) = 2x 3 + 4x + 3 and A(x) = x — 2, then P(x) can be divided by A(x) 
as follows: 

2x 2 + 4x + 12 
x — 2 | 2x 3 + Cte 2 + 4x — 3 
2x 3 — Ax 2 

Ax 2 + Ax — 3 
Ax 2 — 8x 

12x — 3 
12x - 24 
21 
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The quotient is 2x 2 + 4x + 12 and the remainder is 21. We have 


2x 3 + 4x + 3 
x — 2 


2x 2 + 4x + 12 + 


21 

x — 2' 


This can be written as 

2x 3 + 4x — 3 = (x — 2)(2x 2 + 4x + 12) + 21. 


Note that the degree of the ”polynomial” 21 is 0. 


4.4.2 The Remainder Theorem 

If the polynomial f(x) is divided by (x — a) then the remainder is /(a). 

Proof: 

Following the above, we can write 

f(x) = A(x) ■ Q(x ) + R(x), 

where A{x) = (x — a). Since the degree of A{x) is 1, the degree of R{x) is zero. That is, 
R(x) = r where r is a constant. 

f(x) = (x — a)Q(x) + r where r is a constant. 

/(a) = 0 ■ Q(a) + r 
= r 

So, if f(x) is divided by (x — a) then the remainder is /(a). 


Example 

Find the remainder when P(x) = 3x 4 — x 3 + 30a; — 1 is divided by a. x + 1, b. 2x — 1. 

Solution 

a. Using the Remainder Theorem: 

Remainder = P(—1) 

= 3-(-1)-30-1 
= -27 

b. 


Remainder = P(-) 


3(|) 4 -(|) S + 30(i) - 1 
3 1 

16 _ 8 15 _ 1 

14— 

16 
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Example 


When the polynomial /(x) is divided by x 2 — 4, the remainder is 5x + 6 . What is the 
remainder when /(x) is divided by (x — 2 )? 

Solution 

Write /(x) = ( x 2 — 4) ■ g(x) + (5x + 6 ). Then 

Remainder = /(2) 

= 0 ■ g( 2 ) + 16 
= 16 

A consequence of the Remainder Theorem is the Factor Theorem which we state below. 

4.4.3 The Factor Theorem 

If x = a is a zero of /(x), that is /(a) = 0, then (x — a) is a factor of /(x) and /(x) may 
be written as 

/(x) = (x — a)g(x) 

for some polynomial g(x). 

Also, if (x — a) and (x — b ) are factors of /(x) then (x — a)(x — b ) is a factor of /(x) and 

/(x) = (x — a)(x — b) ■ Q(x) 

for some polynomial Q(x). 

Another useful fact about zeros of polynomials is given below for a polynomial of degree 
3. 

If a (real) polynomial 

P(x) = ax 3 + bx 2 + cx + d, 

where a ^ 0, a, b, c and d are real, has exactly 3 real zeros a, (3 and 7 , then 

P(x) = a(x — a)(x — j3){x — 7 ) (1) 

Furthermore, by expanding the right hand side of (1) and equating coefficients we get: 

i 

a + (3 + 7 = —; 

a 

ii 

(3 

a/3 + cry + /Ty = 

a 

iii 

a/37 = — . 
a 
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This result can be extended for polynomials of degree n. We will give the partial result 
for n — 4. 

If 

P(x) = ax 4 + bx 3 + cx 2 + dx + e 

is a polynomial of degree 4 with real coefficents, and P(x) has four real zeros a, (3, 7 and 
S, then 

P(x) = a(x — a)(x — P)(x — 7 ) (a; — h) 
and expanding and equating as above gives 

a/?7<5 = 

a 


If a = 1 and the equation P(x) = 
be a factor of the constant term, 
of a polynomial. That is, we look 
(if any) are roots of the equation 

Example 

Let f(x) = Ax 3 — 8x 2 — x + 2 

a. Factorise f(x). 

b. Sketch the graph of y = f(x). 

c. Solve f(x) > 0. 

Solution 

a. Consider the factors of the constant term, 2. We check to see if ±1 and ±2 are solutions 
of the equation f{x) = 0 by substitution. Since /(2) = 0, we know that (x — 2) is a 
factor of f(x). We use long division to determine the quotient. 

4x 2 — 1 

x — 2 4x 3 — 8x 2 — x + 2 
4x 3 — 8x 2 

— x + 2 

— x + 2 


0 has a root which is an integer, then that integer must 
This gives us a place to start when looking for factors 
at all the factors of the constant term to see which ones 
P{x) = 0. 


So, 

f(x) = (x — 2) (4a ; 2 — 1) 

= (x — 2 ) ( 2 a; — l)( 2 x + 1 ) 
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b. 



The graph of f(x) = 4x 3 — 8 x 2 — x + 2. 
c. /(x) > 0 when — |<x<^orx> 2 . 


Example 

Show that (x — 2 ) and (x — 3) are factors of P(x) = x 3 — 19x + 30, and hence solve 
x 3 — 19x + 30 = 0. 


Solution 

P(2) = 8 — 38 + 30 = 0 and P(3) = 27 — 57 + 30 = 0 so (x — 2) and (x — 3) are both 
factors of P(x) and (x — 2)(x — 3) = x 2 — 5x + 6 is also a factor of P(x). Long division 
of P(x) by x 2 — 5x + 6 gives a quotient of (x + 5). 

So, 

P(x) = x 3 — 19x + 30 = (x — 2 )(x — 3)(x + 5). 

Solving P(x) = 0 we get (x — 2)(x — 3)(x + 5) = 0. 

That is, x = 2 or x = 3 or x = —5. 

Instead of using long division we could have used the facts that 

i the polynomial cannot have more than three real zeros; 

ii the product of the zeros must be equal to —30. 

Let a be the unknown root. 

Then 2 • 3 • a = —30, so that a = —5. Therefore the solution of P(x) = x 3 — 19x + 30 = 0 
is x = 2 or x = 3 or x = —5. 
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4.5 Exercises 


1. When the polynomial P(x) is divided by (x — a)(x — b ) the quotient is Q(x) and the 
remainder is R(x). 

a. Explain why R(x) is of the form mx + c where m and c are constants. 

b. When a polynomial is divided by ( x — 2) and (x — 3), the remainders are 4 and 9 
respectively. Find the remainder when the polynomial is divided by x 2 — 5x + 6. 

c. When P(x) is divided by (x — a ) the remainder is a 2 . Also, P{b ) = b 2 . Find R(x) 
when P(x) is divided by ( x — a)(x — b). 

2. a. Divide the polynomial /(x) = 2x 4 + 13x 3 + 18x 2 + x — 4 by g(x) = x 2 + 5x + 2. 

Hence write f(x) = g(x)q(x) + r(x) where q(x) and r(x) are polynomials. 

b. Show that f(x) and g(x) have no common zeros. (Hint: Assume that a is a 
common zero and show by contradiction that a does not exist.) 

3. For the following polynomials, 
i factorise 


ii 

solve P(x) = 0 



iii 

sketch the graph 

of y = 

:P(: 

a. 

P{x) 

S 2 

= x° — x — 

lOx — 

8 

b. 

P{x) 

= x° — x — 

16x — 

20 

c. 

P{x) 

= x 3 + 4x 2 - 

-8 


d. 

P{x) 

= x 3 — x 2 + 

x — 6 


e. 

P(x) 

= 2x 3 — 3x 2 

- llx 

+ 6 
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5 Solutions to exercises 


1.4 Solutions 

1. a. The domain of f(x) = \/9 — x 2 is all real x where — 3 < x < 3. The range is all 
real y such that 0 < y < 3. 

b. 



The graph of f(x) 


2 . 


^(x + h) — ip(x) (x + h) 2 + 5 — (x 2 + 5) 
h h 

x 2 + 2 xh + li 2 + 5 — x 2 — 5 
h 

h 2 + 2 xh 
h 

= h + 2x 



The graph of y — \Jx — 1. The domain 

H ,^-, 1,1 i, , The graph of y — \2x\. Its domain is all 

is all real x>± and the range is all real 1 u \ \ 

> q real x and range all real y > 0. 
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The graph of y = — j. The domain is all real x/4 and the range is all real y ^ 0. 



4. a. 


The graph of y — \2x\ — 1. The domain is all real x, and the range is all real 

y > -i. 

The perpendicular distance d from (0, 0) to x + y + k = 0 is d— | ^=j. 


b. For the line x + y + k = 0 to cut the circle in two distinct points d < 2. ie \k\ < 2\/2 
or -2^/2 < k < 2\/2. 


5. a. 



The graph of y — {\) 



The graph of y 2 = x 2 


6 . v 2 = x 3 is not a function. 



The graph of y — —\J4 — x 2 . This is a 
function with the domain: all real x such 
that —2 < x < 2 and range: all real y 
such that — 2 < y < 0. 


The graph of \x\ — \y\ = 0. This is not 
the graph of a function. 



Ay 


2t 










-2 


The graph of y = x 3 . This is a function 
with the domain: all real x and range: 
all real y. 


The graph of y — A. This is the graph 
of a function which is not defined at x = 
0. Its domain is all real x ^ 0, and range 
is y = ±1. 



The graph of \y\ = x. This is not the graph of a function. 



8 . 


A(~) = 

V 


(-) 2 + 2 + 
P 

1 


1)2 

P J 


pz 


+ 2 + 


= - + 2 + p 2 

pZ 

= Mp) 

9. a. The values of x in the interval 0 < x < 4 are not in the domain of the function, 
b. x — 1 and x — — 1 are not in the domain of the function. 


10. a. 0( 3) + 0(4) + 0(5) = log(2.5) 

b. 0(3) + 0(4) + 0(5) + • • • + 0(n) = log(f) 

11. a. y = 3 when z = 3. 

b. i L(M(x)) = 2(x 2 — x) + 1 
ii M(L(x )) = 4x 2 + 2x 

12. a. a = 2, b = 2 so the equations is y = 2x 2 — 2. 
b. a = 5, b = 1 so the equation is y = 



14. S(u - 1) = 

Hence 

S(n) - S(n - 1) 


The graph of |x| + \y\ = 1. 


n n — 1 
2n + 1 2n — 1 
n(2n — 1 ) — (2 n + l)(n — 1 ) 
(2n — l)(2n + 1) 

2 n 2 — n — (2 n 2 — n — 1 ) 
(2n- l)(2n+ 1) 

1 

( 2 n- l)( 2 n+ 1 ) 
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b. 



The points of intersection are (—1,3) and (5,3). 

Therefore the solutions of \x — 2| =3 are x — — 1 and x = 5. 

14. The parabolas intersect at (2,1). 


15. 



y = k intersects the circle at two distinct points when 2 < k < 12. 


16. 



The point of intersection is (1,1). Therefore the solution of = 1 is x — 1. 


Ill 



17. 



The point of intersection is (0, 2). Therefore the solution of \x — 2| = \x + 2| is x = 0. 
18. n — — 1 or n — 2 . 


19. a. For x > 4, |x — 4| = a; — 4 = 2a; when a; = —4, but this does not satisfy the 
condition of x > 4 so is not a solution. 

For x < 4, |a: — 4| = —a; + 4 = 2a; when a; = |. a; = | is < 4 so is a solution. 
Therefore, x = | is a solution of \x — 4| = 2x. 



The graph of y = \x — 4 and y = 2x intersect at the point (|, |). So the solution 
of |a; — 4| = 2x is x = |. 


2.11 Solutions 

1. a. The domain is all real x, and the range is all real y > —2. 

b. i — 2 < x < 0 or x > 2 

ii x < — 2 or 0 < x < 2 

c. i k < —2 

ii There is no value of k for which f(x) = k has exactly one solution. 

iii k = 2 or k > 0 
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iv k = 0 

v — 2 < k < 0 


d. y = f{x) is even 

a. b. 



y = f(x) is odd. 



y = g(x) is even. y = g(x) is odd. 


a. even b. even c. neither d. odd e. od< 

f. even g. even h. neither i. even j. eve 


h(-x ) = f(-x)-g(-x) 
= f(x) ■ -g(x) 

= -f{x)-g{x) 

= —h(x) 

Therefore h is odd. 

b. 

h{-x) = ( g(-x)) 2 

= (-(g(x )) 2 
= (g(x )) 2 
= h(x) 


Therefore h is even. 



c. 


h(—x) 


fi~x) 

g(-x) 

fix) 
-g(x ) 
fix) 

g(x) 

—h(x) 


Therefore h is odd. 


d. 


K-x) = fi-x) • ( g{-x )) 2 
= fix) • ( -g(x )) 2 
= fix) ■ (g(x )) 2 
= h(x) 

Therefore h is even. 


5. If / is defined at x — 0 

m = /(-o) 
- -m 
2 /( 0 ) = 0 
Therefore /(0) = 0. 


(since 0 = — 0 ) 

(since / is odd) 
(adding /( 0 ) to both sides) 


3.2 Solutions 


1. a. 2/(—1) + /(2) = 2(1 - (-1)) + (1 - (2) 2 ) = 4 + (-3) = 1. 
b. /(a 2 ) = 1 — (a 2 ) 2 = 1 — a 4 since a 2 > 0. 

2. Yon can see from the graph below that there is one solution to f(x) = 2, and that 
this solution is at x — — 1 . 
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3. 


g(x) = { 


1 

x+l 

for x < 1 

y/l — X 2 

for — 1 < x < 1 

-1 

for x > 1 


4. a. The domain of / is all real x > —2. 



b. The range of / is all real y > —4. 

c. i f(x) = 0 when x = —2 or x = 2 . 

ii f(x) = —3 when x = 1. 

d. i f(x) = k has no solutions when k < —4. 

ii f(x) = k has 1 solution when —4 < k < — 2 or k > 0. 

iii f(x) = k has 2 solutions when —2 < k < 0 . 

5. Note that /(0) = 0. 



6 . The domain of g is all real x, x ^ —2. 





The range of g is all real y < 0 or y > 2. 

7. Note that there may be more than one correct solution. 

a. Defining / as 

x + 6 for x < —3 
—x for —3 < x < 0 

f(x) = < 

x for 0 < x < 3 

—x + 6 for x > 3 

gives a function describing the McMaths burgers’ logo using 4 pieces. 

b. Defining / as 

x + 6 for x < —3 
f(x) = < jx| for —3 < x < 3 
—x + 6 for x > 3 

gives a function describing the McMaths burgers’ logo using 3 pieces. 

c. Defining / as 

( 3 — \x + 31 for x < 0 

f(x) = { 

3 — \x — 3| for x > 0 

gives a function describing the McMaths burgers’ logo using 2 pieces. 

8 . a. Here a = 1, b = —4, c = 2 and d = —4. So, 

( x 2 — 4 for 0 < x < 2 

f(x) = { 

I 2x — 4 for x > 2 

b. Defining / to be an odd function for all real x, x ^ 0, we get 
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c. We can define / as follows 

2x + 4 for x < —2 
4 — x 2 for —2 < x < 0 

f( x ) = 9 

x 2 — 4 for 0 < x < 2 
2x — 4 for x > 2 

3.4 Solutions 

1. a. 0 < x < 4 

b. — 3 < p < 1 

c. x < —4 or —3 < x < 3 or x > 4 

2. a. The graph of y — 4x(x — 3) is given below 


b. From the graph we see that 4x(x — 3) < 0 when 0 < x < 3 




a. The graphs y = 5 — x and y — | intersect at the points (1,4) and (4,1). 

b. The graphs of y = 5 — x and y — | 



c. The inequality is satisfied for x<0orl<r<4. 

a. The graph of y = 2 X . 

'\v / 

5- / 

-—x 

- -' - ~1 -'- 7 - 1 - 1 -'- 1 ->— 

- 4 - 2 ^ 2 4 

b. 2 X < | when x < —1. 

c. The midpoint M of the segment AB has coordinates ( a 1 2 b . 2a + 2b y 
Since the function y = 2* is concave up, the ^-coordinate of M is greater than 
/(44 so, 






b. x = 2 is the repeated root. 

c. The equation f(x) = k has exactly 3 solutions when k = 0 or k = 3.23. 

d. f(x) < 0 when — 2 < x < 0. 

e. The least possible degree of the polynomial f(x) is 4. 

f. Since /(0) = 0, the constant in the polynomial is 0. 

g. f(x) + k > 0 for all real x when k > 9.91. 

4.5 Solutions 

1. a. Since A(x) = (x — a)(x — b ) is a polynomial of degree 2, the remainder R(x) must 

be a polynomial of degree < 2. So, R(x) is a polynomial of degree < 1. That is, 
R(x) = mx + c where m and c are constants. Note that if m — 0 the remainder 
is a constant. 

b. Let P(x) = (x 2 — 5a: + Q)Q(x) + (mx + c) = (x — 2)(x — 3)Q(a:) + (mx + c). 

Then 

P( 2) = (0)(-l)Q(2) + (2m + c) 

= 2m + c 
= 4 

and 

P( 3) = (l)(0)Q(3) + (3m + c) 

= 3m + c 
= 9 

Solving simultaneously we get that m = 5 and c = —6. So, the remainder is 
R(x) = 5x — 6. 

c. Let P(x ) = (a: — a)(x — b)Q(x) + (mi + c). 

Then 

P(a) = (0)(a — b)Q(a) + (rna + c) 

= am + c 
= a 2 

and 

P(6) = (b - a)(0)Q(b) + (mb + c) 

= bm + c 
= b 2 

Solving simultaneously we get that m = a + b and c = — ab provided a ^ b. 

So, R(x) = (a + b)x — ab. 

2. a. 

2a; 4 + 13a; 3 + 18a; 2 + x — 4 = (x 2 + 5a; + 2) (2a; 2 + 3a; — 1) — 2 
b. Let a be a common zero of f(x) and g(x). That is, f(a ) = 0 and g(a) = 0. 
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Then since f(x) = g(x)q(x) + r{x) we have 

f(a) = g(a)q(a) + r(a) 

= (O)g(a) + r(a) since g(a) = 0 
= r(a) 

= 0 since f(a) = 0 

But. from part b. r(x ) = —2 for all values of x, so we have a contradiction. 
Therefore, f(x) and g(x) do not have a common zero. 

This is an example of a proof by contradiction. 


3. a. i P(x) = x 3 — x 2 — 10a; — 8 = (x + l)(x + 2)(x — 4) 


3 — x 2 

ii x = —1, x = — 2 and x = 4 are solutions of P(x) = 0. 

iii 



The graph of P(x) = x 3 — x 2 — lCte — 8. 

b. i P(x ) = x 3 — x 2 — 16x — 20 = (x + 2) 2 (x — 5). 

ii x = — 2 and x = 5 are solutions of P(x) = 0. x = — 2 is a double root. 

iii 



The graph of P(x) = x 3 — x 2 — 16a; — 20 


c. i P(x) = x 3 +4x 2 —8 = (x+2)(x 2 +2x—4) = (x+2)(x— (—l+v / 5))(a;—(—1 — v^)) 
ii x = —2, x = — 1 + a/5 and x = — 1 — a/5 are solutions of P{x) = 0. 
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The graph of P(x) = x 3 + 4x 2 — 8. 

The zeros are x = —2, x = — 1 + y/5 and x = —1 — y/5. 

P(x) = x 3 — x 2 + x — 6 = (x — 2)(x 2 + x + 3). x 2 + x + 3 = 0 has no real 
solutions. 

x = 2 is the only real solution of P(x) = 0. 



The graph of P(x) = x 3 — x 2 + x — 6. 
There is only one real zero at x — 2. 

P(x) = 2x 3 - 3x 2 - 11 + 6 = (x + 2)(x - 3){2x - 1). 
x = —2, x = \ and x = 3 are solutions of P(x) = 0. 



The graph of P(x) = 2x 3 — 3x 2 — 11 + 6. 



Chapter four: 

Lecture notes on relations 

and functions 




' 1 . 7 ho , idca ® f » relatk >»- ut X and Y be two sots. Wo would like to formalize 

' ,ClW<Xm * “ d K Intuitively speaking, tins is a well-delined 

Example 1.2. */zef .9 ie a set an<i //-/ y _ y _ oS >i 

*■ “«««/«« subsets oiTZLLLtl c7: c !:Lt <re 7 l ! l,Ml 

\ and Y. (Proper containment, A C ft, C tCt " CC " 

^- - zzz:;°/n y «:r* Rv 

Example 1.4. Let X = Y = R. Then <,< >,> arc rotations between R „„ rf R 

«To7;^ )i:; ci ' on - rhm - -**-««*>*. 

April 15, 2016. 
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Example 1.6. Let X = Y = Z. Then divisibility is a relation between Z and Z: 
we say xRy if x \ y. 

Example 1.7. Let X = Y = Z. Then “having the same parity” is a relation 
between Z and Z. 

In many of the above examples we have X = Y. This will often (but certainly not 
always!) be the case, and when it is we may speak of relations on X. 

1.2. The formal definition of a relation. 

We still have not given a formal definition of a relation between sets X and Y. In 
fact the above way of thinking about relations is easily formalized, as was suggested 
in class by Adam Osborne: namely, we can think of a relation R as a function from 
X x Y to the two-elenrent set {TRUE, FALSE}. In other words, for ( x , y) £ X xY, 
we say that xRy if and only if f((x,y)) = TRUE. 

This is a great way of thinking about relations. It has however one foundational 
drawback: it makes the definition of a relation depend on that of a function, whereas 
the standard practice for about one hundred years is the reverse: we want to de¬ 
fine a function as a special kind of relation (c.f. Example 5 above). The familiar 
correspondence between logic and set theory leads us to the official definition: 

Definition: A relation R between two sets X and Y is simply a subset of the 
Cartesian product X xY, i.e., a collection of ordered pairs {x,y). 

(Thus we have replaced the basic logical dichotomy “TRUE/FALSE” with the basic 
set-theoretic dichotomy “is a member of/ is not a member of”.) Note that this new 
definition has some geometric appeal: we are essentially identifying a relation R 
with its graph in the sense of precalculus mathematics. 

We take advantage of the definition to adjust the terminology: rather than speaking 
(slightly awkwardly) of relations “from X to Y" we will now speak of relations on 
X x Y. When X = Y we may (but need not!) speak of relations on X. 

Example 1.8. Any curve in R 2 defines a relation on R x R. E.g. the unit circle 

x 2 + y 2 = 1 

is a relation in the plane: it is just a set of ordered pairs. 

1.3. Basic terminology and further examples. 

Let X,Y be sets. We consider the set of all relations on X x Y and denote it 
by 1Z(X,Y). According to our formal definition we have 

1l(X,Y) = 2 XxY , 

i.e., the set of all subsets of the Cartesian product X x Y. 

Example 1.9. a) Suppose X = 0. Then X x Y = 0 and 1Z(X xY)=2 0 = {0}. 
That is: if X is empty, then the set of ordered pairs (x, y) for x € X and y £Y is 
empty, so there is only one relation: the empty relation. 

b) Suppose Y = 0. Again X x Y = 0 and the discussion is the same as above. 
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Example 1.10. a) Suppose X = {•} consists of a single element. Then X x Y = 
{(*,y) I U G Y}; in other words, X x Y is essentially just Y itself, since the first 
coordinate is always the same. Thus a relation R on X xY corresponds to a subset 
ofY: formally, the set of all y GY such that »Ry. 

b) Suppose Y = {•} consists of a single element. The discussion is analogus to that 
of part a), and relations on X x Y correspond to subsets of X. 

Example 1.11. Suppose X and Y are finite sets, with ffX = m and ffY = n. 
Then 1Z(X,Y) = 2 Xx ' L is finite, of cardinality 

jj-2 XxY = 2* XxY = 2 # x '# y = 2 mn 

The function 2 mn grows rapidly with both m and n, and the upshot is that if X and 
Y are even moderately large finite sets, the set of all relations on X x Y is very 
large. For instance if X = {a, b} and Y = {1,2} then there are 2 2 ' 2 = 16 relations 
on X xY. It is pi'obably a good exercise for you to write them all down. However, 
if X = {a, b, c} and Y = {1, 2, 3} then there are 2 3 ' 3 = 512 relations on X xY, and 
- with apologies to the Jackson 5? - it is less easy to write them all down. 

Exercise 1.1. Let X and Y be nonempty sets, at least one of which is infinite. 
Show: IZ(X, Y) is infnite. 

Given two relations R\ and R 2 between X and Y, it makes sense to say that 
Ri Q R 2 : this means that R\ is “stricter” than R 2 or that R 2 is “more permis¬ 
sive” than Ri. This is a very natural idea: for instance, if X is the set of people 
in the world, R\ is the brotherhood relation i.e., ( x,y ) £ R\ iff x and y are 
brothers - and R 2 is the sibling relation - i.e., (x, y) £ R 2 iff x and y are siblings - 
then R\ C R 2 : if x and y are brothers then they are also siblings, but not conversely. 

Among all elements of 7 Z(X,Y), there is one relation R 0 which is the strictest 
of all, namely R 0 = 0: 1 that is, for no (x,y) € X x Y do we have (x,y) € R 0 . In¬ 
deed R 0 C R for any R £ TZ(X, Y). At the other extreme, there is a relation which 
is the most permissive, namely Rxxy = X x Y itself: that is, for all (x,y) £ X x Y 
we have (x, y) £ Rxxy- And indeed R C Rxxy for any R £ 1Z(X, Y). 

Example 1.12. Let X = Y. The equality relation R = {(x, x) \ x £ X} can be 
thought of geometrically as the diagonal of X xY. 

The domain 2 of a relation R C X x Y is the set of x £ X such that there exists 
y £ Y with (x, y) £ R. In other words, it is the set of all elements in x which relate 
to at least one element of Y. 

Example 1.13. The circle relation {{x,y) £ K 2 | x 2 + y 2 = 1} has domain [—1,1]. 

Given a relation R C X x Y, we can define the inverse relation i? _1 C Y x X by 
interchanging the order of the coordinates. Formally, we put 

R~ X = {(y, x ) £ Y x X I (x,y) £ R}. 

Geometrically, this corresponds to reflecting across the line y = x. 

^The notation here is just to emphasize that we are viewing 0 as a relation on X x Y. 

2 I don’t like this terminology. But it is used in the course text, and it would be confusing to 
change it. 
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Example 1.14. Consider the relation UdxR attached to the function f(x) = 
x 2 : 

R = {(a:, x 2 ) ] x £ R}. 

The graph of this relation is an upward-opening parabola: it can also be described 
by the equation y = x 2 . The inverse relation R~ 3 is {( x 2 ,x ) | x £ R}, which 
corresponds to the equation x = y 2 and geometrically is a parabola opening right- 
ward. Note that the domain of the original relation R is R, whereas the domain of 
R- 1 is [0,oo). Moreover, R~ 3 is not a function, since some values of x relate to 
more than one y-value: e.g. (1,1) and (1,-1) are both in R~ 3 . 

Example 1.15. Consider the relation attached to the function f(x) = x 3 : namely 

R = {(x, a; 3 ) | x £ R}. 

This relation is described by the equation y = x 3 ; certainly it is a function, and its 
domain is R. Consider the inverse relation 

R~ 3 = {(x 3 ,x) | x £ R}, 

which is described by the equation x = y 3 . Since every real number has a unique 
real cube root, this is equivalent to y = xs. Thus this time R ~ 1 is again a function, 
and its domain is R. 

Later we will study functions in detail and one of our main goals will be to under¬ 
stand the difference between Examples 1.14 and 1.15. 

1.4. Properties of relations. 

Let X be a set. We now consider various properties that a relation il on I - 
i.e., R C X x X may or may not possess. 

Reflexivity: For all x £ X, (x, x) £ R. 

In other words, each element of X bears relation R to itself. Another way to 
say this is that the relation R contains the equality relation on X. 

Exercise 1.2. Which of the relations in Examples 1.1 through 1.15 are reflexive? 

Anti-reflexivity: For all x £ X, (x, x) jLnR. 

Certainly no relation on X is both reflexive and anti-reflexive (except in the silly 
case X = 0 when both properties hold vacuously) . However, notice that a rela¬ 
tion need not be either reflexive or anti-reflexive: if there are x, y £ X such that 
( x , x) £ R and ( y , y) £ R, then neither property holds. 

Symmetry: For all x,y £ X, if (x,y) £ R, then ( y,x) £ R. 

Again, this has a geometric interpretation in terms of symmetry across the diagonal 
y = x. For instance, the relation associated to the function y = ^ is symmetric 
since interchanging x and y changes nothing, whereas the relation associated to the 
function y = x 2 is not. (Looking ahead a bit, a function y = f{x) is symmetric iff 
it coincides with its own inverse function.) 

Exercise 1.3. Which of the relations in Examples 1.1 through 1.15 are symmetric? 
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Example 1.16. Let V be a set. A (simple, loopless, undirected) graph - in 

the sense of graph theory, not graphs of functions! - is given by a relation E on V 
which is irreflexive and symmetric. Thus: for x,y £ V, we say that x and y are 
adjacent if (x,y) £ E. Moreover x is never adjacent to itself, and the adjacency 
of x and y is a property of the unordered pair { x , y}: if x is adjacent to y then y is 
adjacent to x. 

Anti-Symmetry: for all x,y € A, if (x,y) £ R and (y,x) £ R, then x = y. 

Exercise 1.4. Which of the relations in Examples 1.1 through 1.16 are anti¬ 
symmetric ? 

Transitivity: for all x,y,z £ A, if (x, y) £ R and (y, z) £ R, then (x, z) £ R. 

“Being a parent of” is not transitive, but “being an ancestor of” is transitive. 
Exercise 1.5. Which of the relations in Examples 1.1 through 1.15 are transitive? 

Worked Exercise 1.6. 

Let R be a relation on X. Show the following are equivalent: 

(i) R is both symmetric and anti-symmetric. 

(ii) R is a subrelation of the equality relation. 

Solution: Suppose that we have a relation R on X which is both symmetric and 
anti-symmetric. Then, for all x,y £ R, if (x, y) £ R, then by symmetry we have 
also (y,x) £ R, and then by anti-symmetry we have x = y. Thus we’ve shown 
that if (i) holds, the only possible elements (x,y) £ R are those of the form (x,x), 
which means that R is a subrelation of the equality relation. Conversely, if R is 
a subrelation of equality and (x, y) £ R, then y = x, so (y, x) £ R. Similarly, if 
(x,y) £ R and (y,x) £ R then x = y. So R is both symmetric and anti-symmetric. 

Now we makes two further defintions of relations with possess certain combinations 
of these basic properties. The first is the most important definition in this section. 

An equivalence relation on a set A is a relation on X which is reflexive, sym¬ 
metric and transitive. 

A partial ordering on a set A is a relation on A which is reflexive, anti-symmetric 
and transitive. 

Exercise 1.7. Which of the relations in Examples 1.1 through 1.16 are equivalence 
relations? Which are partial orderings? 

We often denote equivalence relations by a tilde - x ~ y - and read x ~ y as “x 
is equivalent to y". For instance, the relation “having the same parity” on Z is an 
equivalence relation, and x ~ y means that x and y are both even or both odd. 
Thus it serves to group the elements of Z into subsets which share some common 
property. In this case, all the even numbers are being grouped together and all 
the odd numbers are being grouped together. We will see shortly that this is a 
characteristic property of equivalence relations: every equivalence relation on a set 
A determines a partition on A and conversely, given any partition on A we can 
define an equivalence relation. 

The concept of a partial ordering should be regarded as a “generalized less than 
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or equal to” relation. Perhaps the best example is the containment relation C on 
the power set V(S) of a set S. This is a very natural way of regarding one set as 
“bigger” or “smaller” than another set. Thus the insight here is that containment 
satisfies many of the formal properties of the more familiar < on numbers. However 
there is one property of < on numbers that does not generalize to C (and hence 
not to an arbitrary partial ordering): namely, given any two real numbers x,y we 
must have either x < y or y < x. However for sets this does not need to be the case 
(unless S has at most one element). For instance, in the power set of the positive 
integers, we have A = {1} and B = {2}, so neither is it true that A C B or that 
B C A. This is a much stronger property of a relation: 

Totality: For all x,y € A, either (x,y) £ R or (y,x) £ R. 

A total ordering (or linear ordering) on a set A is a partial ordering satis¬ 
fying dichotomy. 

Example 1.17. The relation < on R is a total ordering. 

There is an entire branch of mathematics - order theory - devoted to the study 
of partial orderings. 3 In my opinion order theory gets short shrift in the standard 
mathematics curriculum (especially at the advanced undergraduate and graduate 
levels): most students learn only a few isolated results which they apply frequently 
but with little context or insight. Unfortunately we are not in a position to combat 
this trend: partial and total orderings will get short shrift here as well! 

1.5. Partitions and Equivalence Relations. 

Let A be a set, and let ~ be an equivalence relation on A. 

For x £ X, we define the equivalence class of x as 

[x] = {y £ X \ y~ x}. 

For example, if ~ is the relation “having the same parity” on Z, then 

[2] = {...,-4,-2, 0,2,4,...}, 
i.e., the set of all even integers. Similarly 

[1] = {...-3,-1,1,3,...} 

is the set of all odd integers. But an equivalence class in general has many “repre¬ 
sentatives” . For instance, the equivalence class [4] is the set of all integers having 
the same parity as 4, so is again the set of all even integers: [4] = [2], More gen¬ 
erally, for any even integer n, we have [n] = [0] and for any odd integer n we have 
[n] = [1]. Thus in this case we have partitioned the integers into two subsets: the 
even integers and the odd integers. 

We claim that given any equivalence relation ~ on a set A, the set {[x] | x £ X} 
forms a partition of A. Before we proceed to demonstrate this, observe that we 
are now strongly using our convention that there is no “multiplicity” associated to 
membership in a set: e.g. the sets {4, 2 + 2, l 1 + 3° + 2 1 } and {4} are equal. The 

3 For instance, there is a journal called Order, in which a paper of mine appears. 


130 



LECTURE NOTES ON RELATIONS AND FUNCTIONS 


above representation {[x] | x £ X} is highly redundant: for instance in the above 
example we are writing down the set of even integers and the set of odd integers 
infinitely many times, but it only “counts once” in order to build the set of subsets 
which gives the partition. 

With this disposed of, the verification that V = {[x] | x £ X} gives a partition 
of X comes down to recalling the definition of a partition and then following our 
noses. There are three properties to verify: 

(i) That every element of V is nonempty. Indeed, the element [x] is nonempty 
because it contains x! This is by reflexivity: x ~ x, so x £ {y £ X | 3 / ~ x}. 

(ii) That the union of all the elements of V is all of X. But again, the union is 
indexed by the elements x of X, and we just saw that x £ [x], so every x in X is 
indeed in at least one element of V. 

(iii) Finally, we must show that if [x]fl[y] ^ 0 , then [x] = [y]: i.e., any two elements 
of V which have a common element must be the same element. So suppose that 
there exists z £ [x] D \y\. Writing this out, we have z ~ x and z ~ y. By symmetry, 
we have y ~ z; from this and 2 ~ x, we deduce by transitivity that y ~ x, i.e., 
y £ [x]. We claim that it follows from this that [y] C [x]. To see this, take any 
w £ [y], so that w ~ y. Since w ~ x, we conclude w ~ x, so w £ [x]. Rerunning the 
above argument with the roles of x and y interchanged we get also that [y] C [x], 
so [x] = [y]. This completes the verification. 

Note that the key fact underlying the proof was that any two equivalence classes 
[x] and [y] are either disjoint or coincident. Note also that we did indeed use all 
three properties of an equivalence relation. 

Now we wish to go in the other direction. Suppose X is a set and V = {Ui}i & i is a 
partition of X(here I is just an index set). We can define an equivalence relation ~ 
on X as follows: we say that x ~ y if there exists i £ I such that x, y £ U. In other 
words, we are decreeing x and y to be equivalent exactly when they lie in the same 
“piece” of the partition. Let us verify that this is an equivalence relation. First, let 
x £ X. Then, since V is a partition, there exists some i € I such that x € Ui, and 
then x and x are both in Ui, so x ~ x. Next, suppose that x ~ y: this means that 
there exists i £ I such that x and y are both in U t \ but then sure enough y and x 
are both in Ui (“and” is commutative!), so y ~ x. Similarly, if we have x,y,z such 
that x ~ y and y ~ z, then there exists i such that x and y are both in Ui and a 
possibly different index j such that y and z are both in Uj. But since y £ Ui fl Uj, 
we must have U % = Uj so that x and z are both in U = Uj and x ~ 2 . 

Moreover, the processes of passing from an equivalence relation to a partition and 
from a partition to an equivalence relation are mutually inverse: if we start with 
an equivalence relation R, form the associated partition V(R), and then form the 
associated equivalence relation ~ (V(R)), then we get the equivalence relation R 
that we started with, and similarly in the other direction. 


1.6. Examples of equivalence relations. 
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Example 1.18. (Congruence modulo n) Let n € Z + . There is a natural partition 
of Z into n parts which generalizes the partition into even and odd. Namely, we put 

Y\ = {..., —2 n, —n, 0, n, 2n, ■ • ■} = {kn \ k € Z} 

the set of all multiples of n, 

Y 2 = —2 n + 1, — n + 1,1 ,n + 1,2n + 1...} = {kn + 1 | k £ Z}, 

and similarly, for any 0 < d < n — 1, we put 

Yd = {..., —2 n + d, —n + d, d, n + d, 2n + d...} = {kn + d \ kinlf}. 

That is, Yd is the set of all integers which, upon division by n, leave a remainder of 
d. Earlier we showed that the remainder upon division by n is a well-defined integer 
in the range 0 < d < n. Here by “well-defined”, I mean that for 0 < d\ ^ d 2 < n, 
the sets Yd 1 and Yd 2 are disjoint. Recall why this is true: if not, there exist k\,k 2 
such that kin + d± = k 2 n + d 2 , so d\ — d 2 = (k 2 — k\)n, so di — d 2 is a multiple of 
n. But —n < di — d 2 < n, so the only multiple of n it could possibly be is 0, i.e., 
d\ = d 2 . It is clear that each Yd is nonempty and that their union is all of Z, so 
{Yd}j~Q gives a partition ofh. 

The corresponding equivalence relation is called congruence modulo n, and writ¬ 
ten as follows: 

x = y (mod n). 

What this means is that x and y leave the same remainder upon division by n. 

Proposition 1.19. For integers x,y, the following are equivalent: 

(i) x = y (mod n). 

(ii) n | x — y. 

Proof. Suppose that x = y (mod n). Then they leave the same remainder, say d, 
upon division by n: there exist k\, k 2 € Z such that x = kin + d, y = k 2 n + d, so 
x — y = (ki — k 2 )n and indeed n \ x — y. Conversely, suppose that x = kin + d \, 
y = k 2 n + d 2 , with di and d 2 distinct integers both in the interval [0, n — 1]. Then, 
if n divides x — y = (ki — k 2 )n + (di — d 2 ), then it also divides di — d 2 , which as 
above is impossible since —n < di — d 2 < n. □ 

Example 1.20. (Fibers of a function) Let f : X —» Y be a function. We define a 
relation R on X by ( xi,x 2 ) 6 R iff f(xi) = f(x 2 ). This is an equivalence relation. 
The equivalence class of [x\ is called the fiber over f(x). 

1.7. Extra: composition of relations. 

Suppose we have a relation R C X x Y and a relation S C Y x Z. We can define a 
composite relation S'oJlcIx^ina way which will generalize compositions 
of functions. Compared to composition of functions, composition of relations is 
much less well-known, although as with many abstract concepts, once it is pointed 
out to you, you begin to see it “in nature’. This section is certainly optional reading. 

The definition is simply this: 

S o R = {{x, z) € X x Z | By G Fsuch that (x, y) € R and {y, z) € S}. 

In other words, we say that x in the first set X relates to 2 in the third set Z if 
there exists at least one intermediate element y in the second set such that x relates 
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to y and y relates to z. 

In particular, we can always compose relations on a single set X. As a special 
case, given a relation R, we can compose it with itself: say 

r( 2 ' ) = R o R = {(x, z) £ X x X | By £ X such that xRy and yRz}. 

Proposition 1.21. For a relation R on X, the following are equivalent: 

(i) R is transitive. 

(ii) RW C R. 

Exercise 1.8. Show that the composition of relations is associative. 

Exercise 1.9. Show: (S o l ?) -1 = R~ x o S' -1 . 

Exercise 1.10. Let X = {l,...,iV}. To a relation R on X we associate its 
adjacency matrix M = M(R): if ( i,j) £ R, we put M(i,j) = 1; otherwise we 
put M(i,j) = 0. Show that the adjacency matrix of the composite relation R 2 is 
the product matrix M(R) ■ M(R) in the sense of linear algebra. 

2. Functions 

Let X and Y be sets. A function / : X —> Y is a special kind of relation between 
X and Y. Namely, it is a relation R C X x Y satisfying the following condi¬ 
tion: for all x £ X there exists exactly one y £ Y such that ( x , y) £ R. Because 
element of y attached to a given element x of X is unique, we may denote it by f{x). 

Geometrically, a function is a relation which passes the vertical line test: ev¬ 
ery vertical line x = c intersects the graph of the function in exactly one point. In 
particular, the domain of any function is all of X. 

Example 2.1. The equality relation {(a:,x) | x £ X} on X is a function: f(x) = x 
for all x. We call this the identity function and denote it by lx- 

Example 2.2. a) Let Y be a set. Then 0 x Y = 0, so there is a unique relation 
on 0 x Y. This relation is - vacuously - a function. 

b) Let X be a set. Then X x 0 = 0 , so there is a unique relation on X x 0 , with 
domain 0 . If X = 0 , then we get the empty function f : 0 —> 0 . If X ^ 0 then 
the domain is not all of X so we do not get a function. 

If / : X —> Y is a function, the second set Y is called the codomain of /. Note the 
asymmetry in the definition of a function: although every element x of the domain 
X is required to be associated to a unique element y of Y, the same is not required 
of elements y of the codomain: there may be multiple elements x in X such that 
f(x) = y, or there may be none at all. 

The image of / : X —>■ Y is {y £ Y such that y = /( x) for some x £ X .} 4 

In calculus one discusses functions with domain some subset of R and codomain R. 
Moreover in calculus a function is usually (but not always...) given by some rela¬ 
tively simple algebraic/analytic expression, and the convention is that the domain 
is the largest subset of R on which the given expression makes sense. 

%ome people call this the range, but also some people call the set Y (what we called the 
codomain) the range, so the term is ambiguous and perhaps best avoided. 
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Example 2.3. 

a) The function y = 3x is a function from R to M. Its range is all of R. 

b) The function y = x 2 is a function from R to R. Its range is [0, oo). 

c) The function y = x 3 is a function from R to R. Its range is all of R. 

d) The function y = yfx is a function from [0, oo) to R. Its range is [0, oo). 

e) The arctangent y = arctana: is a function from R to R. Its range is , |). 

2.1. The set of all functions from X to Y. 

Let X and Y be sets. We denote the set of all functions / : X —> Y by Y x . 
Why such a strange notation? The following simple and useful result gives the 
motivation. Recall that for n £ Z + , we put [n] = {1,2, ...,n}, and we also put 
[0] = 0 . Thus #[n] = n for all n £ N. 

Proposition 2.4. Let m,n £ N. Then we have 

#[to]^ = m n . 

In words: the set of all functions from (1,..., n} to {1,..., m} has cardinality m n . 

Proof. To define a function / : {1,..., n} —> {1,..., m}, we must specify a sequence 
of elements /(1),..., f(n) in {1,..., to}. There are to possible choices for /(l), also 
to possible choices for /(2), and so forth, up to to possible choices for f(n), and these 
choices are independent. Thus we have to • • • m n times = to” choices overall. □ 

2.2. Injective functions. 

From the perspective of our course, the most important material on functions are 
the concepts injectivity, surjectivity and bijectivity and the relation of these prop¬ 
erties with the existence of inverse functions. 

A function / : X —> Y is injective if every element y of the codomain is asso¬ 
ciated to at most one element x £ X. That is, / is injective if for all xi,x 2 £ X, 
f(x i) = f(x 2 ) implies x x = x 2 . 

Let us meditate a bit on the property of injectivity. One way to think about it 
is via a horizontal line test: a function is injective if and only if each horizontal line 
y = c intersects the graph of / in at most one point. Another way to think about 
an injective function is as a function which entails no loss of information. That is, 
for an injective function, if your friend tells you x £ X and you tell me /( x) 6 Y, 
then I can, in principle, figure out what x is because it is uniquely determined. 

Consider for instance the two functions f(x) = x 2 and f{x) = x 3 . The first 
function f(x) = x 2 is not injective: if y is any positive real number then there are 
two x-values such that f(x) = y, x = sjy and x = — yfy. Or, in other words, if 
f(x) = x 2 and I tell you that f(x) = 1 , then you are in doubt as to what x is: it 
could be either +1 or —1. On the other hand, f(x) = a : 3 is injective, so if I tell you 
that f[x) = a : 3 = 1 , then we can conclude that x = 1 . 

How can we verify in practice that a function is injective? One way is to con¬ 
struct an inverse function, which we will discuss further later. But in the special 
case when / : R —> R is a continuous function, the methods of calculus give useful 
criteria for injectivity. 
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Before stating the result, let us first recall the definitions of increasing and de¬ 
creasing functions. A function / : R —> R is (strictly) increasing if for all 
Xi,X 2 £ M, x\ < X 2 ==> f(x 1 ) < f(x 2 ). Similarly, / is (strictly) decreasing if 
for all X\,X 2 £ R, x\ < X 2 => f(x 1 ) > f(x 2 ). Notice that a function which is 
increasing or decreasing is injective. The “problem” is that a function need not be 
either increasing or decreasing, although “well-behaved” functions of the sort one 
encounters in calculus have the property that their domain can be broken up into 
intervals on which the function is either increasing or decreasing. For instance, the 
function f(x) = x 2 is decreasing on (— 00,0 ) and increasing on (0, 00). 

Theorem 2.5. Let f : R —> R be a continuous function. 

a) If f is injective, then f is either increasing or decreasing. 

b) If f is differentiable and either f'(x) > 0 for all x £ R or f'{x) < 0 for all 
x £ R, then f is injective. 

It is something of a sad reflection on our calculus curriculum that useful and basic 
facts like this are not established in a standard calculus course. However, the full 
details are somewhat intricate. We sketch a proof below. 

Proof. We prove part a) by contraposition: that is, we assume that / is continuous 
and neither increasing nor decreasing, and we wish to show that it is not injective. 
Since / is not decreasing, there exist X\ < X2 such that f(x 1 ) < /(X 2 ). Since / is 
not increasing, there exist X3 < X4 such that f{x 3) > f(x 4 ). If f(x 3) = f(x 4 ). We 
claim that it follows that there exist a < b < c such that either 
Case 1 :/(&) > f(a) and f(b) > /(c), or 
Case 2: f(b) < /(a) and f{b) < /(c). 

This follows from a somewhat tedious consideration of cases as to in which order the 
four points x±, X2, X3, X4 occur, which we omit here. Now we apply the Intermediate 
Value Theorem to / on the intervals [a, b] and [ 6 , c]. In Case 1, every number smaller 
than f(b) but sufficiently close to it is assumed both on the interval [a, b] and again 
on the interval [ 6 , c], so / is not injective. In Case 2, every number larger than f(b ) 
but sufficiently close to it is assumed both on the interval [a, 6 ] and again on [b, c], 
so again / is not injective. 

As for part b), we again go by contraposition and assume that / is not injective: 
that is, we suppose that there exist a < b such that /(a) = f(b). Applying the 
Mean Value Theorem to / on [a, b] , we get that there exists c, a < c < b, such that 

f (c = - 7 -= 0 , 

b— a 

contradicting the assumption that f(x) is always positive or always negative. □ 

Remark: The proof shows that we could have replaced part b) with the apparently 
weaker hypothesis that for all x £ R, fix) ^ 0. However, it can be shown that this 
is equivalent to /' always being positive or always being negative, a consequence of 

the Intermediate Value Theorem For Derivatives. 

Example 2.6. a) Let f : R —> R by f(x) = arctanx. We claim f is injective. 
Indeed, it is differentiable and its derivative is f'(x) = 1 + x2 > 0 for all x £ R. 
Therefore f is strictly increasing, hence injective. 

b) Let f : R —> R by f(x) = —x 3 — x. We claim f is injective. Indeed, it is 
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differentiable and its derivative is f'(x) = —3x 2 — 1 = —(3a ; 2 +1) < 0 for all x £ R. 
Therefore f is strictly decreasing, hence injective. 

Example 2.7. Let f : R —> R be given by f(x) = x 3 . One meets this function in 
precalculus and calculus mathematics, and one certainly expects it to be injective. 
Unfortunately the criterion of Theorem 2.5 falls a bit short here: the derivative is 
f'(x) = 3a; 2 , which is always non-negative but is 0 at x = 0. 

We will show “by hand” that f is indeed injective. Namely, let X\,X 2 £ R and 
suppose x 3 = x 2 - Then 

0 = x\ — x 2 = {x\ — a; 2 )(a ; 2 + x\x 2 + a; 2 ). 

Seeking a contradiction, we suppose that x\ a; 2 . Then X\ — a ; 2 0, so we can 

divide through by it, getting 

0 = a ; 2 + xix 2 + x\ = (x\ + -y ) 2 + -a; 2 . 

Because each of the two terms in the sum is always non-negative, the only way the 
sum can be zero is if 

(a;i + —) — -x 2 — 0 . 

The second equality implies a ; 2 = 0, and plugging this into the first inequality gives 
x 2 = 0 and thus x\ = 0. So x± = 0 = a; 2 : contradiction. 

We gave a proof of the injectivity of / : x i-A x 3 to nail down the fact that Theorem 
2.5 gives a sufficient but not necessary criterion for a differentiable function to be 
injective. But we would really like to able to improve Theorem 2.5 so as to handle 
this example via the methods of caclulus. For instance, let n be a positive integer. 
Then we equally well believe that the function / : R —> R by /( x) = x 2n+1 should 
be injective. It is possible to show this using the above factorization method....but 
it is real work to do so. The following criterion comes to the rescue to do this and 
many other examples easily. 

Theorem 2.8. Let f : R —> R be a differentiable function. 

a) Suppose that f'[x) > 0 for all x and that there is no a < b such that f'(x) = 0 
for all x £ ( a,b ). Then f is strictly increasing (hence injective). 

b) Suppose that f'(x) < 0 for all x and that there is no a < b such that f'(x) = 0 
for all x £ (a, b). Then f is strictly decreasing (hence injective). 

Proof. We prove part a); the proof of part b) is identical. Again we go by con¬ 
trapositive: suppose that / is not strictly increasing, so that there exists a < b 
such that /(a) < f(b). If /(a) < f(b), then applying the Mean Value Theorem, we 
get a c in between a and b such that f(c) < 0, contradiction. So we may assume 
that f(a) = f(b). Then, by exactly the same MVT argument, f{x) > 0 for all x 
implies that / is at least weakly increasing, i.e., x\ < a ; 2 => f(x i) < /(x 2 ). But 
a weakly increasing function / with /(a) = f(b) must be constant on the entire 
interval [a, b], hence f'(x) = 0 for all x in (a, b), contradicting the hypothesis. □ 

Worked Exercise 2.1. We will show that for any n £ Z + , the function f : R —»• R 
given by x ^ x 2n+1 is injective. Indeed we have f'(x) = (2n + l)a; , which is non¬ 
negative for all x € R and is 0 only at x = 0. So Theorem 2.8a) applies to show 
that f is strictly increasing, hence injective. 
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2.3. Surjective functions. A function / : X —> Y if its image f(X) is equal to 
the codomain Y. More plainly, for all y GY, there is x G X such that f(x) = y. 

In many ways surjectivity is the “dual property” to injectivity. For instance, it 
can also be verified by a horizontal line test: a function / is surjective if and only 
if each horizontal line y = c intersects the graph of / in at least one point. 

Worked Exercise 2.2. Let m and b be real numbers. Is fix) = mx + b surjective? 

Solution: It is surjective if and only if m 0. First, if m = 0, then f{x) = b 
is a constant function: it maps all of R to the single point b and therefore is at the 
opposite extreme from being surjective. Conversely, if m 0, write y = mx + b 
and solve for x: x = ■ Note that this argument also shows that if m 0, f is 

injective: given an arbitary y, we have solved for a unique value of x. 

By the intermediate value theorem, if a continuous function / : R —> R takes on 
two values to < M, then it also takes on every value in between. In particular, if 
a continuous function takes on arbitrarily large values and arbitrarily small values, 
then it is surjective. 

Theorem 2.9. Let ao,... ,a n G R and suppose a„ 0. Let P : R —>■ R by 

P(x) = a n x n + ... + a\X + a q. 

Thus P is a polynomial of degree n. Then: P is surjective if and only if n is odd. 
Proof. Suppose that n is odd. Then, if the leading term a n is positive, then 
lim P{x) = +oo, lim P{x) = —oo, 

x—>oo x—>—oo 

whereas if the leading term a n is negative, then 

lim P(x) = —oo, lim P{x) = +oo, 

x—too x — y —oo 

so either way P takes on arbitarily large and small values. By the Intermediate 
Value Theorem, its range must be all of R. 

Now suppose n is even. Then if a n is positive, we have 

lim P{x) = lim P(x ) = +oo. 

x—too x—>—oo 

It follows that there exists a non-negative real number M such that if |x| > M, 
P(x) > 0. On the other hand, since the restriction of P to [— M, M] is a continuous 
function on a closed interval, it is bounded below: there exists a real number m 
such that P(x) > m for all x G [— M, M], Therefore P{x) > to for all x , so it is not 
surjective. Similarly, if a n is negative, we can show that P is bounded above so is 
not surjective. □ 

2.4. Bijective functions. 

A function / : X —> Y is bijective if it is both injective and surjective. 

Exercise 2.3. Show: or any set X, the identity function lx '■ X —> X by Ia'(^) = x 
is bijective. 

Exercise 2.4. Determine which of the functions introduced so far in this section 
are bijective. 
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A function is bijective iff for every y (E Y, there exists a unique x € X such that 

fix) = y. 

The following result is easy but of the highest level of importance. 

Theorem 2 . 10 . For a function f : X Y, the following are equivalent: 

(i) f is bijective. 

(ii) The inverse relation / -1 : Y —> X = {(f(x),x) \ x € X} is itself a function. 

Proof. Indeed, we need / to be surjective so that the domain of / _1 is all of Y and 
we need it to be injective so that each y in Y is associated to no more than one x 
value. □ 

2.5. Composition of functions. 

Probably the most important and general property of functions is that they can, 
under the right circumstances, be composed . For instance, in calculus, complicated 
functions are built up out of simple functions by plugging one function into another, 
e.g. \/x 2 + 1, or e sln2: , and the most important differentiation rule - the Chain Rule 
- tells how to find the derivative of a composition of two functions in terms of the 
derivatives of the original functions. 

Let / : X — > Y and g : Y —>• Z: that is, the codomain of / is equal to the 
domain of g. Then we can define a new function g o f : X —> Z by: 

x g{f{x)). 

Remark: Note that go f means first perform / and then perform g. Thus function 
composition proceeds from right to left, counterintuitively at first. There was a 
time when this bothered mathematicians enough to suggest writing functions on 
the right , i.e., ( x)f rather than /( x). But that time is past. 

Remark: The condition for composition can be somewhat relaxed: it is not neces¬ 
sary for the domain of g to equal the codomain of /. What is precisely necessary 
and sufficient is that for every x € X, f{x) lies in the domain of g, i.e., 

Range(/) C Codomain(g). 

Example: The composition of functions is generally not commutative. In fact, if 
g o f is defined, fog need not be defined at all. For instance, suppose / : R —>• R. 
is the function which takes every rational number to 1 and every irrational number 
to 0 and g : {0,1} —> {a, b} is the function 0 i-A 5, 1 h>«. Then go f : R —>■ {a, b} is 
defined: it takes every rational number to a and every irrational number to b. But 
fog makes no sense at all: 

f(gm = fib) = ???. 

Remark: Those who have taken linear algebra will notice the analogy with the 
multiplication of matrices: if A is an m x n matrix and B is an n x p matrix, then 
the product AB is defined, anmxp matrix. But if ?n ^ p , the product BA is not 
defined. (In fact this is more than an analogy, since an m x n matrix A can be 
viewed as a linear transformation La : R" —> R m . Matrix multiplication is indeed 

'’This is a special case of the composition of relations described in §.Y.X, but since that was 
optional material, we proceed without assuming any knowledge of that material. 
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a special case of composition of functions.) 

Even when go f and fog are both defined - e.g. when /,g : R —> R, they need not 
be equal. This is again familiar from precalculus mathematics. If f(x) = x 2 and 
g{x) = x + 1, then 

g{f(x)) = x 2 + 1, whereas f(g(x)) = (x + l) 2 = x 2 + 2x + 1. 

On the other hand, function composition is always associative: if / : X —> Y, 
g :Y —► Z and h : Z —»• W are functions, then we have 

(hog)o f = ho (go /). 

Indeed the proof is trivial, since both sides map x € X to h(g(f(x)). 6 

Exercise: Let f : X —>Y. 

a) Show that / o l x = f. 

b) Show that ly o / = /. 

2.6. Basic facts about injectivity, surjectivity and composition. 

Here we establish a small number of very important facts about how injectivity, 
surjectivity and bijectivity behave with respect to function composition. First: 

Theorem 2.11. Let f : X -A Y and g : Y —> Z be two functions. 

a) If f and g are injective , then so is g o f. 

b) If f and g are surjective, then so is g o /. 

c) If f and g are bijective, then so is g o f. 

Proof, a) We must show that for all X\, X2 £ X, if g(f(x i)) = g(f(x 2 )), then 
X\ = x 2 - But put 1/1 = f(x 1 ) and y 2 = f(x 2 ). Then g(y 1 ) = g(y 2 ). Since g is 
assumed to be injective, this implies f(x±) = yi = yi = f{x 2 ). Since / is also 
assumed to be injective, this implies x\ = x 2 . 

b) We must show that for all z € Z, there exists at least one x in X such that 
g(f( x )) = z ■ Since g : Y —> Z is surjective, there exists y &Y such that g(y) = 2 . 
Since / : X —> Y is surjective, there exists x £ X such that f(x) = y. Then 

g(f( x)) = g{y) = 

c) Finally, if / and g are bijective, then / and g are both injective, so by part a) 

g o / is injective. Similarly, f and g are both surjective, so by part b) g o f is 
surjective. Thus g o f is injective and surjective, i.e., bijective, qed. □ 

Now we wish to explore the other direction: suppose we know that gofis injective, 
surjective or bijective? What can we conclude about the “factor” functions / and gl 

The following example shows that we need to be careful. 

Example: Let X — Z = {0}, let Y = R. Define / : X —> Y be /(0) = 7 r (or 
your favorite real number; it would not change the outcome), and let / be the con¬ 
stant function which takes every real number y to 0: note that this is the unique 
function from R to {0}. We compute go f: g(f(0)) = g(ir) = 0. Thus g o f is the 
identity function on X: in particular it is bijective. However, both / and g are far 

®As above, this provides a conceptual reason behind the associativity of matrix multiplication. 
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from being bijective: the range of / is only a single point {t}, so / is not surjective, 
whereas g maps every real number to 0, so is not injective. 

On the other hand, something is true: namely the “inside function” / is injec¬ 
tive, and the outside function g is surjective. This is in fact a general phenomenon. 

Theorem 2.12. (Green and Brown Fact) Let f : A —» Y and g : Y —► Z be 
functions. 

a) If 9 ° f is injective, then f is injective. 

b) If go f is surjecitve, then g is surjective. 

c) If 9 ° f is bijective, then f is injective and g is surjective. 

Proof, a) We proceed by contraposition: suppose that / is not injective: then there 
exist xi ^ X 2 in A such that f(x i) = /(a^). But then g(f(x i)) = g(f{x 2 )), so 
that the distinct points xi and X 2 become equal under g o f: that is, g o / is not 
injective. 

b) Again by contraposition: suppose that g is not surjective: then there exists 
z £ Z such that for no y in Y do we have 2 = g{y). But then we certainly cannot 
have an x £ X such that z = g(f(x)), because if so taking y = /( x) shows that z 
is in the range of g, contradiction. 

c) If g o f is bijective, it is injective and surjective, so we apply parts a) and b). □ 

Remark: The name of Theorem 2.12 comes from the Spring 2009 version of Math 
3200, when I presented this result using green and brown chalk, decided it was 
important enough to have a name, and was completely lacking in inspiration. 

2.7. Inverse Functions. 

Finally we come to the last piece of the puzzle: let / : X —> Y be a function. 
We know that the inverse relation / _1 is a function if and only if / is injective and 
surjective. But there is another (very important) necessary and sufficient condition 
for invertibility in terms of function composition. Before stating it, recall that for 
a set X , the identity function lx is the function from A to A such that lx(x) = x 
for all x £ X. (Similarly 1 y(u) = V for all y £ Y.) 

We say that a function g : Y —> X is the inverse function to / : A —► Y if 
both of the following hold: 

(IF1) go f = 1 X '. i.e., for all x £ A, giftx)) = x. 

(IF2) / o g = 1 Y : i.e., for all y £ Y, f{g(y)) = y. 

In other words, g is the inverse function to / if applying one function and then 
the other - in either order! brings us back where we started. 

The point here is that g is supposed to be related to / -1 , the inverse relation. 
Here is the precise result: 

Theorem 2.13. Let f : X —> Y. 

a) The following are equivalent: 

(i) f is bijective. 

(ii) The inverse relation / -1 : Y -A A is a function. 
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(iii) f has an inverse function g. 

b) When the equivalent conditions of part a) hold, then the inverse function g is 
uniquely determined: it is the function f~ l . 

Proof, a) We already know the equivalence of (i) and (ii): this is Theorem 2.10 
above. 

(ii) => (iii): Assume (ii), i.e., that the inverse relation f^ 1 is a function. We 
claim that it is then the inverse function to / in the sense that / -1 o / = l x and 
/ o / -1 = ly. We just do it: for x £ X, f~ 1 (f(x)) is the unique element of X 
which gets mapped under / to f(x): since x is such an element and the uniqueness 
is assumed, we must have f^ 1 (f(x)) = x. Similarly, for y GY, f~ 1 {y) is the unique 
element a; of A such that f(x) = y, so f(f~ 1 (y)) = f(x) = y. 

(iii) => (i): We have < 70 / = l x , and the identity function is bijective. By 

the Green and Brown Fact, this implies that / is injective. Similarly, we have 
/ o g = ly- is bijective, so by the Green and Brown Fact, this implies that / is 
surjective. Therefore / is bijective . 7 

b) Suppose that we have any function g : Y —> X such that g o f = 1 x and 
fog = ly. By the proof of part a), we know that / is bijective and thus the inverse 
relation / _1 is a function such that / _1 o / = l x , / o / -1 = ly. Thus 

g = golY=g 0 {fo f- 1 ) = (gof)o f~ l = l x o f~ l = / _1 . 

□ 

In summary, for a function /, being bijective, having the inverse relation (obtained 
by “reversing all the arrows”) be a function, and having another function g which 
undoes / by composition in either order, are all equivalent. 


7 A 


very similar argument shows that g is bijective as well. 
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Chapter five: 
Sets and function 

1.1 Section sets 


The basic concepts of sets and functions are topics covered in high school math courses 
and are thus familiar to most university students. We take the intuitive point ol view that 
sets are unordcrcd collections of objects. We first recall some standard terminology and 
notation associated with sets. When we speak about sets, we usually have a "universal set” 
U in mind, to which the various sets of our discourse belong. 

Definition 1 (Set notation) A set is an unordered collection of distinct ohjeds. We 
use the notation x € .S' to mean “x is an element of S'' and x <{ S to mean "x is not an 
clement of S’.” Given hvo subsets (subcollcctions) of U, X and Y, we say "X is a subset 
of Y ,” written X C Y, if x € X implies that x € Y. Alternatively, we may say that. “Y is 
a superset of X.” X C Y and Y D X mean the same thing. We say that two subsets X 
and Y of U arc equal if X C Y and Y C X. We use braces to designate sets when we wish 
to specify or describe them in terms of their elements: A = {a,6,c}, D = {2,4,6,...}. A 
set with k dements is called a k-set or set with cardinality k. The cardinality of a set A is 
denoted by \A\. 

Since a set is an unordered collection of distinct objects, the following all describe the 
same 3-element set 


{a,<i,c} = { b,a,c } = {c, b.a) ~ {a,b,b,c.b}. 

The first three are simply listing the elements in a different, order. The last happens to 
mention some elements more than once. But, since a set. consists of distinct, objects, the 
elements of the set are still just a, b, c. Another way to think ot this is: 

T W o sets A and B arc equal if and only if every element of A is an 
element of B and every element of B is an element of A. 

Thus, with A = {a,6,c} and B = { a,b,b,c,b }, we can sec that everything in A is in B and 
everything in B is in A. You might think “When we write a set, the elements are in the 
order written, so why do you say a set is not ordered 7 ” When we write something down 
we re stuck — we have to list them in some order. You can think of a set differently: rite 
cadi element on a separate slip of paper and put the slips in a paper bag. No matter how 
you shake the bag, it’s still the same set. 

If we are given that A is a set and no other information about A , then there is no 
ordering to the elements of A. Thus, wo camiot speak of “the second element of the set A 
unless wc have specified an ordering of the elc,.rents of A. If we wtsh to regard A as ordered 
in some way, then we specify this fact explicitly: The elements of A are ordered a, b e 
or "A = (a b c).” The latter notation replaces the braces with parentheses and desrgnates 
that A is ordered, left to right, as indicated. Wc call this an ordered set An ««« 
n also called a linear order. Various other names arc also used: to 1, veefot, straw. 
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— all with no repeated elements. 1 Of course, you’ve seen repeated elements in vectors, 
for example the point in the plane at the coordinates (1,1). That’s fine, it’s just not an 
ordered set. If there are k elements in the ordered set, it is referred to as a k-list, k-vector, 
etc., or as a list, vector, etc., of length k — all with no repeated elements because they are 
ordered sets. 

Sometimes we cannot list the elements of a set explicitly. What do we do if we want 
to describe the set of all real numbers greater than 1 without writing it out in words? We 
write 

{x | x G R, x > 1} or {x | x > 1} or {x : x > 1}. 

These are read “the set of all x such that ...” In the first example we mentioned that x was 
a real number (x G R). In the other two we didn’t because we assumed the reader knew 
from context that we were talking about real numbers. 

For the most part, we shall be dealing with finite sets. Let U be a set and let A and 
B be subsets of U. The sets 


A(lB = {x\xeA and x G B} 

and 

iUF = {x|xGiorxGB} 

are the intersection and union of A and B. The set A \ B or A — B is the set difference 
of A and B (i.e., the set {x j x G A, x B}). The set U \A (also A c , A' or ~A) is the 
complement of A (relative to U). Note that A — B = {x | x G A, x ^ B} = A n B c . The 
empty set, denoted by 0, equals U c . Also note that, for any set A C U, A U A c = U and 
A n A c = 0. 

The set A © B = (A \ B) U (B \ A) is the symmetric difference of A and B. We use 
A x B = {(x,y) | x G A, y G B} to denote the product or Cartesian product of A and B. 
If we want to consider the product of k sets, Ai ,..., A^, this is denoted by x^ =1 A, : . If we 
want to consider the product of a set A with itself k times, we write x k A. 


Set Properties and Proofs 


The algebraic rules for operating with sets are also familiar to most beginning university 
students. Here is such a list of the basic rules. In each case the standard name of the rule 
is given first, followed by the rule as applied first to n and then to U. 


Theorem 1 (Algebraic rules for sets) The universal set U is not mentioned explicitly 
but is implicit when we use the notation ~X = U — X for the complement of X. An 

1 Why is it okay to specify a set S = {a, b, c, a} where the element a has been repeated, 
but it is not okay to have repeated elements in an ordering of 5? When we say S = 
{a, b, c, a}, we know that S contains just the three elements a, b and c. If we were to talk 
about the ordered set (a, b, c, a) it would not make sense because it would say that the 
element a is in two places at once: the first position and the last position. 
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alternative notation is X c = ~V. 


Associative: 

Distributive: 

Idempotent: 

Double Negation: 

DeMorgan: 

Absorption: 

Commutative: 


(PnQ)nR = Pn(Q nR) 
Pn{QuR) = (PC Q)U (PAR) 
PnP = P 

P = P 

~(P n Q ) = ^P u 
Pu(PnQ) = p 
pnQ=QnP 


(P U Q) U R = P U (Q U R) 
p u (Q n R) = {P u Q) n (p u R) 
pup = P 


~(P u Q ) = r^P n 
Pn(PuQ) = P 
PUQ=QUP 


These rules are “algebraic” rules for working with n, U, and You should memorize them 
as you use them. They are used just like rules in ordinary algebra: whenever you see an 
expression on one side of the equal sign, you can replace it by the expression on the other 
side. 

When we wrote U P fl Q fl R” you may have wondered if we meant “(P fl Q) fl R” or 
“P n (Q fl R). n The associative law says it doesn’t matter. That is why you will see the 
notation PCiQUR or PUQUR without anyone getting excited about it. On the other 
hand P 0 (Q U R) and (P fl Q) U R may not be equal, so we need parentheses here. 

The best way to “prove” the rules or to understand their validity is through the 
geometric device of a Venn diagram. 


Example 1 (Venn diagrams and proofs of set equations) Here is a Venn diagram 
for three sets, P, Q, and R, with universal set U: 



The three oval regions labeled P, Q, and R represent the sets of those names. The rectan¬ 
gular region represents the universal set U. There are eight subregions, labeled 1 through 8 
in the picture. Region 8 represents the subset PnQCiR] region 1 represents U — (PUQUR); 
region 2 represents the elements of Q — (P U P); and so on. 

Let’s use the above Venn diagram to verify that the distributive rule, P U (Q fl R) = 
(PUQ)n(PUP), is valid. The idea is to replace the sets P, Q, and R by their corresponding 
sets of regions from the Venn diagram. Thus, Q is replaced by {2,5,6,8}, P is replaced 
by {4,6,7, 8}, and R is replaced by {3,5,7, 8}. Even though the sets P, Q , and R are 
arbitrary, perhaps even infinite, the distributive rule reduces to verifying the same rule for 
these simplified sets: 

{4, 6, 7, 8} U ({2, 5, 6, 8} D {3,5, 7,8}) = ({4,6, 7,8} U {2, 5, 6, 8}) n ({4,6, 7,8} U {3, 5, 7, 8}). 
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This identity is trivial to check directly: Both sides reduce to the set {4, 5, 6, 7, 8}. 

This “Venn diagram” approach reduces a set identity that involves potentially infinitely 
many elements to subsets of a set of eight elements. It is fine for proofs and especially good 
for checking out “set identities” to see quickly if they are true or not. For example, is it 
true that Q — (P fl P) = Q — (P n Q n P)? Checking the Venn diagram shows that both 
sides correspond to the set of regions {2,5,6}. The identity is true. You will get a chance 
to practice this technique in the exercises. D 


There are, of course, other ways to verify set identities. One way is called the element 
method: 


Example 2 (The element method for proofs of set equations) To use that method, 
you simply translate the identity X = Y into basic statements about what conditions a 
single element must satisfy to be (first) in the set on the left and then (second) in the set 
on the right. Thus, to show that X = Y, you assert that if x G X then blah, blah, blah (a 
bunch of words that make sense) implies that x G Y. This shows that ICY, Then, you 
reverse the argument and assert that if y G Y then blah, blah, blah (a bunch of words that 
make sense) implies that y G X. This shows that Y C X. Thus X = Y. 

Here is an example. Show, by the element method that, for all subsets P, Q, and R 
of U, ( P-Q)n(R-Q) = (PnR)~ Q. 

(1) If x G (P — Q) n {R — Q) then (here comes the blah, blah, blah) x is in P but not in 
Q AND x is in R but not in Q. 

(2) Thus x is in P and P, but x is not in Q. 

(3) Thus x is in (PflP) — Q. This shows that (P —Q)n(P —Q) C (PnR) — Q. We leave it 
to you to use the element method to show the reverse, (P-Q)n(R-Q) D (PnR) — Q, 
and hence that (P — Q)r\(R — Q) = (P fl R) — Q. You should start your argument by 
saying, “Suppose x G (P n R) — Q. n □ 


A different sort of element approach looks at each element of the universal set U and 
asks which sets contain it. The reult can be put in tabular form. When this is done, each 
row of the table corresponds to a region in the Venn diagram. The next example illustrates 
this tabular method. 


Example 3 (The tabular method for proofs of set equations) We redo the identity 
of the previous example: (P — Q) n (R — Q) = (P fl R) — Q. To do this we construct 
a table whose columns are labeled by various sets and whose entries answer the question 
“Is x in the set?” The first three columns in the following table are set up to allow all 
possible answers to the three questions “Is x in P?” “Is x in QT‘ “Is x in P?” “Left” and 
“Right” refer to (P — Q) fl (P — Q) and (P fl P) — Q , the two sides of the equation we want 
to prove. “Venn” refers to the region in the Venn diagram of Example 1. Normally that 
column would not be in the table, but we’ve inserted it so that you can see how each row 
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corresponds to a Venn diagram region. 


p 

Q 

R 

P-Q 

R-Q 

Left 

PnP 

Right 

Venn 

No 

No 

No 

No 

No 

No 

No 

No 

1 

No 

No 

Yes 

No 

Yes 

No 

No 

No 

3 

No 

Yes 

No 

No 

No 

No 

No 

No 

2 

No 

Yes 

Yes 

No 

No 

No 

No 

No 

5 

Yes 

No 

No 

Yes 

No 

No 

No 

No 

4 

Yes 

No 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

7 

Yes 

Yes 

No 

No 

No 

No 

No 

No 

6 

Yes 

Yes 

Yes 

No 

No 

No 

Yes 

No 

8 


Since the answers are identical in the columns labeled “Left” and “Right,” the identity is 
proved. 

We can prove more from the table. For example, 

If P C Q U P, then P - Q = (P n P) - Q. 

How does the table prove this? Because of the condition, the row that begins “Yes No 
No” is impossible. Therefore, we throw out that row and compare columns “P — Q" and 
“Right.” □ 


Another way to prove set identities is to use the basic algebraic identities of Theorem 1. 
This is called the algebraic method. 


Example 4 (An algebraic proof) It is probably a good idea for you to label the 
steps with the appropriate rule (e.g., DeMorgan’s rule, associative rule, distributive rule, 
etc.) the first few times you do such a proof. Therefore, we’ll do that in this example. 
Mathematicians, however, would rarely bother to do it. A proof is accepted if others who 
know the basic rules of set theory can read it, understand it, and believe it is true. 

Let’s prove that Q — (P fl R) = Q — (P Pi Q H R). Here it is 


Q - (PnP) = Qn (PnP) c 
= Q n (P c u R c ) 

= {Q n P c ) u (Q n R c ) 

= (Q n P c ) u 0 u (Q n R c ) 

= {Q n P c ) u (Q n Q c ) u (Q n R c ) 
= Q n (P c u Q c u R c ) 

= Qn (PnQnP) c 

= Q-(PnQnR) 


since A — B = A n B c 
DeMorgan’s rule 
distributive rule 
since A U 0 = A 
since Q n Q c = 0 
distributive rule 
DeMorgan’s rule 
since A — B = A n B c 


Some steps in this proof are baffling. For example, why did we introduce 0 in the fourth 
line? We knew where we were going because we worked from both “ends” of the proof. In 
other words, we came up with a proof that moved both ends toward the middle and then 
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rearranged the steps so that we could go from one end to the other. Unfortunately, proofs 
are often presented this way. 

Here’s another way to write the proof of Q — (P n R) = Q — (P fl Q n R) that shows 
more clearly how we got the proof. Note first that this identity is equivalent to showing 
that 

Q n (P n R) c = Q n (P n Q n R) c 

since A — B = An B c . This is equivalent, by DeMorgan’s rules, to showing that 

Q n (P c u R c ) = Q n ( P c u Q c u R c ). 


But 

Qn{P c UQ c UR c ) =Qn((P c UR c )UQ c ] = Qn(P c UR c ))u(QnQ c ) = Qn(P c \JR c ). 

This latter identity follows from the fact that Q n Q c = 0 and X U 0 = X for any set X. 
This completes the proof. 

How should you write an algebraic proof? You can use whichever method you prefer. 
The first approach can be read mechanically because of the way it’s laid out. However, if 
you use the first approach, you may sometimes need to use the second method for yourself 
first. □ 


Ordering Sets 


In computer programming, you will store and compute with sets of all sorts (sets of number, 
letters, geometric figures, addresses to arrays, pointers to structures, etc.). In almost all 
cases, you will work with these sets as lists (also called “linear orders”) of some type where 
order does matter. The order matters in terms of the efficiency of your computations, not 
in terms of the rules of set theory. 

In many cases, the linear ordering of the elements of a set is inherited from the universal 
set U. For example, the sets A = {1, 2, 3} and B = {a, b, c} inherit a natural linear ordering 
from the integers and the alphabet, respectively. But what about C = {?, >, <,}? There 
is no standard convention for C . You could use the ASCII code order (<, >, ?), but if you 
do, some explanation should be given. 
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Example 5 (Lexicographic order) If you have decided on linear orders (i.e., listings) 
for a set X and a set Y. there is a commonly used and natural linear ordering for X x Y 
called lexicographic order. Suppose we list X and Y in some manner: 

(xi,x 2 , ■ ■ ■ ,x n ) and (y 1 ,y 2 ,... ,y m ). 

Given pairs (a, 6) G X x Y and (c, d) £ X x Y, we say that (a, b ) is lexicographically less 
than or equal to (c, d) if 

(1) a is before c in the linear order on X or 

(2) a = c and b is equal to or before d in the linear order on Y. 

For example, the lexicographic order for {1,2,7} x {a, 6} is 

{(l,a), (1,6), (2,a), (2,6), (7,a), (7,6)}. 

Lexicographic order is called lex order for short. 

Once you have ordered X x Y lexicographically, you can order (X x Y) x Z by the 
same two rules (1) and (2) above, provided an order is specified on Z. You can use lex 
order onlxf and the given linear order on Z. Likewise, you can apply (1) and (2) to 
X x (Y x Z) using the given linear order on X and lex order on Y x Z. The sets (X xY) x Z 
and X x (Y x Z) are different sets - elements in the former have the form ((x,y),z) and 
those in the latter have the form (x, ( y,z )). Imagine listing ( X xY) x Z in lex order and 
stripping off the inner parentheses so that ((x,y),z) becomes ( x,y,z ). Now do the same 
with X x (Y x Z). The two lists will contain the same elements in the same order. It’s 
easy to see why the elements are the same, but it’s not so easy to see why the orders are 
the same. Let’s prove the orders are the same. 

That means we have to prove 

((xi,yi),2i) precedes ((x 2 , y 2 ), z 2 ) if and only if (xi, (yi, £i)) precedes (x 2 , (y 2 , z 2 )). 

It will make things easier if we write “((xi, yi), z\) < ((x 2 ,y 2 ), z 2 y for ll {{x\,y\),z\) pre¬ 
cedes {{x 2 ,y 2 ), z 2 ). v By definition the definition of lex order, ((xi,yi),£i) < ((x 2 ,y 2 ),z 2 ) 
means that (xi,yi) < (x 2 ,y 2 ) or (x\,yi) = (x 2 ,y 2 ) and z\ < z 2 . By the definition of 
lex order, the first case means that either x\ < x 2 or x\ = x 2 and y\ < y 2 . Note that 
(xi, y\) = (x' 2 , y 2 ) means X\ = x 2 and y\ = y 2 . Putting all this together, we have shown that 
((xi, yi), Z\) < {(x 2 ,y 2 ), z 2 ) means that either 

(1) xi < x 2 or 

(2) xi = x 2 and y\ < y 2 or 

(3) xi = x 2 and y\ = y 2 and z\ < z 2 . 

In other words, look for the first position where {x\,y\,z{) and (x 2 ,y 2 , z 2 ) disagree and use 
that position to determine the order. We leave it to you to show that the same conditions 
describe (xi, (■ yi,Z\ )) < (x 2 , (y 2 ,z 2 )). This completes the proof. 

Since (X xY)xZ and X x (Y x Z) have the same order when inner parentheses are 
dropped, one usually does that. We just write X xY x Z and call the elements (x, y, z). 
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When you think about what we’ve just done, you should be able to see that, if 
Si, S 2 , ■ ■ ■, S n are sets, we can write 5i x ^2 x ■ ■ • x S n , leaving out parentheses. The 
product Si x S 2 x • • • x S n is also written x^S). 

Suppose (si, s 2 , • • •, s n ) and (ti,t 2 , ■ ■ ■, t n ) are in 5i x S 2 x • ■ ■ x S n . Which comes 
first? You should be able to see that we determine which precedes which as follows: By 
going left to right, find the first position where they disagree. Say position k is where this 
disagreement occurs. Use the order of s & and to determine the order of (si, s 2 , ■ ■ ■, s n ) and 
(ti, t 2 , ■ ■ ■ ,t n ). This is the same order you use when you look things up in a dictionary. □ 


Example 6 (Dictionary order on words or strings) The order of words in the 
dictionary is called “dictionary order.” Lex order appears to be the same as dictionary 
order, but there is a problem with this. We’ve only defined lex order for ra-tuples where n 
has some fixed value, but words in the dictionary have different length. Let’s look at this 
more carefully. 

Let S' be a finite set. Let S k = x k S be the product of S with itself k times. The set 
S° is special and consists of one string e called the empty string. Let S* = U^L 0 S fc . In 
words, S* consists of all strings (words, vectors) of all possible lengths (including length 
zero) over S. Assume S is linearly ordered. We now define an order relation on S* called 
lexicographic order or dictionary order, denoted by <l, on S*. 

Let (ai, a 2 ,... , a m ) and ( 61 , b 2 ,..., b n ) be two elements of S* with m, n > 0. We say 

that 

(ui, a 2 ,, a m ) (bi,b 2 ,..., 6 n ) 

if either of the following two conditions hold: 

(Dl) m < n and a,; = bi for i = 1,..., m. 

(D2) For some k < min(m, n), a* = bi, i = 1,..., k, ak+\ bk+ 1 , and ak+i is before 
bk+\ in the linear order on S. 

Since m, n > 0, we have not discussed the empty string. Thus we need: 

(D3) The empty string e <l x for any string x. 

We have just defined dictionary order and also called it lex order. Is this the same as our 
previous definition of lex order? Yes because the two definitions of lex order agree when 
the strings have the same length. 

We shall study this ordering on words carefully when we study order relations in 
general. For now we just give an example. Let S = {x, y} with the ordering on S the 
alphabetic order. If u = ( x, x, y ) and v = ( x, x, y, x), then u <l v by (Dl). If s = ( x, x, y, x) 
and t = {x, x, x, y), then t <l s by (D2). More examples will be given in the exercises. The 
standard English dictionary is an example where this linear order is applied to a subset of 
all words on the standard English alphabet (the words that have meaning in English). 

A variation on this dictionary order is to order all words first by length and then by 
lex order. Thus, u = (y, y, y) conies before v = (x, x, x, x) because u has length three (three 
components) and v has length one. This order on S* is called length-first lex order or short 
lex order. D 
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Subsets of Sets 

We use the notation V(A) to denote the set of all subsets of A and Vk(A) the set of all 
subsets of A of size (or cardinality) k. We call T(A) “the set of all subsets of A ” or simply 
the power set of A. Let C{n,k ) = \Vk(A)\ denote the number of different fc-subsets that 
can be formed from an n-set. The notation (Jfj is also frequently used. These are called 
binomial coefficients and are read “n choose k .” We now prove 


Theorem 2 (Binomial coefficient formula) The value of the binomial coefficient is 


= C(n, k) = 


n(n — 1) • • • (n — k + 1) 
k\ 


n\ 


k\ (n — k)V 


where 0! = 1 and, for j > 0, j! is the product of the first j integers. We read j\ as “j 
factorial'’. 


Proof: Let A be a set of size n. The elements of Vk{A) are sets and are thus unordered. 
Generally speaking, unordered things are harder to count than ordered ones. Suppose, 
instead of a set of size k chosen from A, you wanted to construct an ordered list L of k 
elements from A ( L is called a “fc-list”). We could construct L in two stages. 

• First choose an element of S G Vk{A) (a subset of A with k elements). This can be 
done in C(n,k ) ways since C\n,k) = \Vk(A)\. 

• Next order S to obtain L. This ordering can be done in k\ = k(k — 1) • • • 1 ways. Why? 
You have k choices for the element of S to appear first in the list L, k — 1 choices for 
the next element, k — 2 choices for the next element, etc. 

From this two-stage process, we see that there are C{n , k) k\ ordered fc-lists with no repeats. 
(The factor C(n,k) is the number of ways to carry out the first stage and the factor k\ is 
the number of ways to carry out the second stage.) 


Theorem 3 (Number of ordered lists) The number of ordered k-lists L that can be 
made from and n-set A is 

• n k if repeats are allowed and 

• n{n — 1) • • • (n — k + 1) = n\/(n — k)\ if repeats are not allowed. One also uses the 
notation ( n)k for these values. This is called the ‘Tailing factorial” and is read “n 
falling k”. 


Why? With repeats allowed, there are n choices of elements in A for the first entry 
in the fc-list L, n choices for the second entry, etc. If repeats are not allowed, there are 
n choices of elements in A for the first entry in the fc-list L, n — 1 choices for the second 
entry, etc. 
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Since we’ve counted the same thing (fc-lists made from A) in two different ways, the 
two answers must be equal; that is, C(n,k)k\ = n\/{n — k)\. Dividing by k\, we have the 
theorem. Q 

In high school, you learned about “Pascal’s Triangle” for computing binomial coeffi¬ 
cients. We review this idea in the next example. 

Example 7 (Binomial recursion) Let X = {x\,... ,x n }. We’ll think of C(n,k ) as 
counting fc-subsets of X. Imagine that we are going to construct a subset S of X with k 
elements. Either the element x n is in our subset S or it is not. The cases where it is in the 
subset S are all formed by taking the various (fc — l)-subsets of X — {x n } and adding x n 
to them. By the definition of binomial coefficients, there are (?“:[) such subsets. The cases 
where it is not in the subset S are all formed by taking the various fc-subsets of X — {x n }. 
By the definition of binomial coefficients, there are ( n ^ 1 ) such subsets. What we’ve done 
is describe how to build all fc-subsets of X from certain subsets of X — {x n }. Since this 
gives each subset exactly once, 



which can be written C(n,k ) = C(n — l,k — 1) + C(n — 1, A:). This equation is called a 
recursion because it tells how to compute the function C(n , k ) from values of the function 
with smaller arguments. Here are the starting values together with the basic recursion: 
C(1,0)=<7(1,1) = 1, 

(7(1, k ) = 0 for k ^ 0,1 and 

C(n, k) = C{n — 1, k — 1) + C(n — 1, k ) for n > 1. 

Below we have made a table of values for C(n, k). 



This tabular representation of C(n, k) is called “Pascal’s Triangle.” □ 

Definition 2 (Characteristic function) Let U be the universal set and let A C U. 

The characteristic function of A, denoted x A Is defined for each x G U by 

/ n / 1, if x £ A, 

V* (l) = \ 0 , i fxfA. 

Thus the domain of x A Is U and the range of x A is {0, l}. 2 


If you are not familiar with “domain” and “range”, see the definition at the beginning 
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Example 8 (Subsets as (O,l)-vectors) If A has n elements, listed (ai, < 22 ,..., a n ), 
then you can specify any subset X C A by a sequence (ei, € 2 , ■ ■ ■, e n ) where e*, = 0 if the 
element a k ^ X and = 1 if the element a k G X. The vector (ei, € 2 ,..., e n ) is just the 
characteristic function of X since e k = x x ( a k)- 

How many different subsets of A are there? We’ll show that there are 2 11 choices for 
(ei, € 2 ,... ,e n ) and thus \P(A)\ = 2 n . Why 2 n ? There are clearly two choices for ei and two 
choices for and so forth. Thus there are 2 x 2 x ■ • ■ = 2" choices for (ei, € 2 ,..., e n ). □ 


Example 9 (Sets with sets as elements) Sets can have sets as elements. In the first 
exercise of this section, you will be asked such questions as “Is {1,2} G {{1, 2}, {3,4}}?” 
or “Is 1 G {{1}, {2}, {3}}?” Easy stuff if you understand the definitions: You can see that 
the set {1, 2} is indeed an element of the set {{1, 2}, {3,4}} because this latter set has just 
two elements, each of them a set of size two, one of which is {1,2}. You can also see that 
every element of {{1}, {2}, {3}} is a set and that the number 1 is nowhere to be found as 
an element of this set. 

You have already seen V(A), which is a set whose elements are sets, namely the subsets 
of A. 

Another important class of sets with sets as elements are the set partitions. Some 
of the elementary aspects of set partitions fit into our present discussion. More advanced 
aspects of them will be discussed in Section 2. Here is a preview. Let A = {1.2,...,15}. 
Consider the following set whose elements are themselves subsets of A. 

<* = {{1}, {2}, {9}, {3,5}, {4,7}, {6,8,10,15}, {11,12,13,14}}. 

This set is a subset of the power set V(A). But, it is a very special type of subset, called a 
set partition of A because it satisfies the three conditions: 

(1) every element of a is nonempty, 

(2) the union of the elements of a is A, and 

(3) if you pick sets X £ a and Y G a, either X = Y or X n Y = 0. 

Any collection of subsets of a set A satisfying (1), (2), and (3) is a set partition of A or 
simply a partition of A. Each element of a (which is, of course, a subset of A) is called a 
block of the partition a. 

How many partitions are there of a set A? This is a tricky number to compute and 
there is no simple formula like C{n , k) = k \(™- k )\ for it. We will discuss it in the Section 2. 
The number of partitions of a set of size n is denoted by B n . These numbers are called Bell 
numbers after Eric Temple Bell. The first few Bell numbers are B\ = 1, B2 = 2, B3 = 5, 
B a = 15, B 5 = 52. 

We can refine the partition a by splitting blocks into smaller blocks. For example, we 
might split the block {6,8,10,15} into two blocks, say {6,15} and {8,10}, and also split 
the block {11,12,13,14} into three blocks, say {13}, {14}, and {11.12}. The resulting 
partition is called a refinement of a and equals 

{{1}, {2}, {9}, {3,5}, {4,7}, {6.15}, {8,10}, {13}, {14}, {11.12}}. 
of the next section. 
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Note that a refinement of a partition is another partition of the same set. We also consider 
a partition a to be a refinement of itself. We shall gain a deeper understanding of the 
notion of refinement when we study order relations. Q 


Exercises for Section 1 

1.1. Answer the following about the G and C operators. 

(a) Is {1,2} €{{1,2}, {3,4}}? 

(b) Is {2} €{1,2,3,4}? 

(c) Is {3}e{l,{2},{3}} 

(d) Is {1.2} C {1, 2, {1,2}, {3,4}}? 

(e) Is 1 G {{1}, {2}, {3}}? 

(f) Is {1,2,1} C {1,2}? 

1.2. For each of the following, draw a Venn diagram. 

(a) AC B, C C B, 4nC = 0 

(b) ADC,BnC = tt. 

1.3. Let A = {w,x,y,z} and B = {a, b}. Take the linear orders on A and B to be 
alphabetic order. List the elements in each of the following sets in lexicographic 
order. 

(a) A x B 

(b) B x A 

(c) Ax A 

(d) B x B 

1.4. Let A = {1,2,3}, B = {u, v} : and C = {m,n}. Take the linear order on A to be 
numeric and the linear orders on B and C to be alphabetic. List the elements in 
each of the following sets in lexicographic order. 

(a) A x (B x C) (use lex order on B x C). 

(b) {A x B) x C (use lex order on Ax B). 

(c) A x B x C. 

1.5. Let £ = (x, y) be an alphabet. List each of the following sets of strings over this 
alphabet in the order indicated. 
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(a) All palindromes (strings that read the same forward and backward) of length 
less than or equal to 4. List them in dictionary order. 

(b) All strings (words) that begin with x and have length less than four. List them 
in both dictionary and length-first lex order. 

(c) List all strings of length four in lex order. 

1.6. Each of the following statements about subsets of a set U is FALSE. Draw a Venn 
diagram to represent the situation being described. In each case case, show that 
the assertion is false by specializing the sets. 

(a) For all A , B , and C, if A B and B <2 C then A % C . 

(b) For all sets A, B, and C , (A U B) fl C = A U {B fl C). 

(c) For all sets A, B, and C , (A — B) fl {C — B) = A — (B U C). 

(d) For all A, B , and C, ifAnC'C-BfiC’ and 4uC CBuC then A = B. 

(e) For all A, B, and C, if Au C = B U C then A = B. 

(f) For all sets A, B , and C, {A — B) — C = A — (B — C). 

1.7. Prove each statement directly from the definitions. 

(a) If A, B, and C are subsets of U, then A C B and A C C implies that A C BrC. 

(b) If A, B , and C are subsets of U, then A C B and A C C implies that A C BUC. 

1.8. Prove, using the definition of set equality, that for all sets A, B, and (7, 

(A - B) n (C - B) = (A n C) - B. 

1.9. Prove each statement by the method indicated. 

(a) Prove using element arguments that if U is the universal set and A and B 
subsets of U, then A C B implies that U — A D U — B (alternative notation: 
A C B implies A c D B c , or A! D B') 

(b) Prove, using element arguments and the definition of set inclusion, that for all 
A, B, and C, if A C B then A fl C C B n C . 

(c) Prove, using (a), (b), and DeMorgan’s law, that for all A, B , and C, if A C B 
then A U C C B U C. 

1.10. Prove each statement by the “element method.” 

(a) If A, B , and C are subsets of U, then A x (B U C) = (A x B) U (A x C). 

(b) If A, B, and C are subsets of U, then A x (B fl C) = (A x B) n (A x C). 

1.11. Prove each of the following identities from the basic algebraic rules for sets. You 
may want to use the fact that D — E = D fl E c . 
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(a) If A, B, and C are subsets of U, then (A — B) — C = A — (B U C). 

(b) If A, B , and C are subsets of U, then {A — B) — C = {A — C) — B. 

(c) If A and B are subsets of U. then [A — B) U (B — A) = {A U B) — (A n B). 

1.12. Prove or give a counterexample. Use a Venn diagram argument for the proof. For 
the counterexample, use a Venn diagram or use set specialization. 

(a) If A, B, and C are subsets of U, then (A — C) n (B — C) n {A — B) = 0. 

(b) If A and B are subsets of U and if A C B, then A fl (U — B) = 0. 

(c) If A, B , and C are subsets of U, and if A C B, then A fl (U — (B fl C)) = 0. 

(d) If A, B, and C are subsets of U, and if (BdC) C A. then (A — B)n(A — C) = 0. 

(e) If A and B are subsets of U and if A fl B = 0, then A x B = 0. 

1.13. Recall that the symmetric difference of sets A and B is A © B = (A — B) U (B — A). 
It is evident from the definition that A © B = B 0 A, the commutative law. Let 
U be the universal set. Prove each of the following properties either using a Venn 
diagram argument or algebraically or directly from the definition. 

(a) A 0 (B © C) = (A © B) 0 C (associative law for ©). 

(b) A®% = A. 

(c) A © A c = U. 

(d) A®A = (b. 

(e) If A © C = B © C then A = B. 

1.14. Let A, B , and C be subsets of U. Prove or disprove using Venn diagrams. 

(a) A — B and B — C are disjoint. 

(b) A — B and C — B are disjoint. 

(c) A — (B U C) and B — (A U C) are disjoint. 

(d) A — (B fl C) and B — (A fl C) are disjoint. 

1.15. Which of the following are partitions of {1, 2,..., 8}? Explain your answers. 

(a) {{1,3,5}, {1,2,6}, {4,7,8}} 

(b) {{1,3,5}, {2,6,7}, {4,8}} 

(c) {{1,3,5}, {2,6}, {2,6}, {4,7,8}} 

(d) {{1,5}, {2,6}, {4,8}} 

1.16. How many refinements are there of the partition {{1, 3, 5}, {2, 6}, {4, 7,8,9}}? Ex¬ 
plain. 

1.17. Suppose S and T are sets with 5flT = 0. Suppose a is a partition of S and r is a 
partition of T. 


155 



Section 2: Functions 


(a) Prove that a U r is a partition of S U T. 

(b) If a has n a refinements and r has n T refinements, how many refinements does 
dUr have? Explain. 


1.18. Use the characteristic function format to list the power set of the following sets. 
That is, describe each element of the power set as a vector of zeroes and ones. 

(a) {1,2,3} 

(b) X xY where X = {a, b} and Y = {x, y}. 

1.19. Find the following power sets: 

(a) P(0) 

(b) V(Vm 

(c) nnvm) 

1.20. Compare the following pairs of sets. Can they be equal? Is one a subset of the 
other? Can they have the same size (number of elements)? 

(a) V(A U B) and V(A) U V(B) 

(b) V{A n B) and V{A) n V(B) 

(c) V{A x B) and V{A) x V{B) 

1.21. Let S' = {1,2,... , n}. Let Si be the set of all subsets of S that contain 1. Let Tf 
denote the set of all subsets of S that don’t contain 1. Prove |Ti| = |Si| = 2( n_1) . 


Section 2: Functions 


Functions, such as linear functions, polynomial functions, trigonometric functions, expo¬ 
nential functions, and logarithmic functions are familiar to all students who have had 
mathematics in high school. For discrete mathematics, we need to understand functions at 
a basic set theoretic level. We begin with a familiar definition. 


Definition 3 (Function) If A and B are sets, a function from A to B is a rule that 
tells us how to find a unique b £ B for each a £ A. We write f(a ) = b and say that f maps 
a to b. We also say the value of f at a is b. 

We write f : A —> B to indicate that f is a function from A to B. We call the set A 
the domain of f and the set B the range or, equivalently, codomain of f. 

To specify a function completely you must give its domain, range and rule. 

If X C A, then f(X) = {/(x) \ x £ X}. In particular /(0) = 0 and f{A) is called the 
image of f. 
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Some people define “range” to be the values that the function actually takes on. Most 
people call that the image. 

In high school, you dealt with functions whose ranges were R and whose domains were 
contained in R; for example, f(x) = l/(x 2 — 1) is a function from R — { — 1,1} to R. If 
you have had some calculus, you also studied functions of functions! The derivative is a 
function whose domain is all differentiable functions and whose range is all functions. If we 
wanted to use functional notation we could write D(f) to indicate the function that the 
derivative associates with /. 

The set of all functions from A to B is written B A . One reason for this notation, as 
we shall see below, is that \B A \ = |R|IA. Thus f : A B and / € B A say the same thing. 

To avoid the cumbersome notation {1, 2,3,..., n}, we will often use n instead. 


Example 10 (Functions as relations) There is a fundamental set-theoretic way of 
defining functions. Let A and B be sets. A relation from A to B is a subset of Ax B. For 
example, if A = 3 = {1, 2, 3} and B = 4, then R = {(1,4), (1, 2), (3, 3), (2, 3)} is a relation 
from A to B. To specify a relation, you must define three sets: A, B and R. 

If the relation R satisfies the condition Vx G A 3! y G B, (. x , y) G R , then the relation 
R is called a functional relation. We used some shorthand notation here that is worth 
remembering: 

V means “for all” 

3 means “for some” or “there exists” 

3! means “for exactly one” 

If you think about Definition 3, you will realize that a “functional relation” is just one 
possible way of giving all of the information required to specify a function. 

Given any relation R C Ax R, the inverse relation R _1 from B to A is {(y,x) : 
(x , y) G R}. For R = {(1,4), (1, 2), (3,3), (2,3)}, A = 3 and B = 4, the inverse relation is 
R _1 = {(4,1), (2,1), (3,3), (3, 2)}. Note that neither R nor R -1 is a functional relation in 
this example. You should make sure that you understand why this statement is true. (Hint: 
R fails the “3!” test and R -1 fails the “V” part of the definition of a functional relation.) 
Note also that if R and R -1 are functional then |A| = \B\. In algebra or calculus, when you 
draw a graph of a real-valued function / : D —> M (such as f(x) = x 3 , f{x) = x/(l — x ), 
etc.), you are attempting a pictorial representation of the set {(x,f(x)) : x G D C R}, 
which is the subset of D x f. This subset is a “functional relation from D to R.” 

In our notation, we would write (a, b) G R to indicate that the pair (a, b ) is in the 
relation R from A to B. People also use the notation a R b to indicate this. For example, 
the “less than” relation {(a, b) \ a < b} is written a < b. 0 


In many cases in discrete mathematics, we are concerned with functions whose domain 
is finite. Special notation is used for specifying such functions. 


Definition 4 (One-line notation) Let A be a finite ordered set with elements ordered 
(ai, a 2 , • • •, G|a|)- Ref B be any set. A function f : A —> B can be written in one-line 
notation as f = (/(ai), /(a 2 ),... , /(o|A|))- Thus the values of the function are written as 
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list, which is also called a vector or a string. In other words the function f assigns to 
the k th element of the list (/(ai), 7 ( 02 ),... , f(a\A\)) for each value of k from 1 to |H|. 


It follows from the definition that we can think of function as an element of B = B x 
B x • • • x B, where there are |v4| copies of B. This is another reason for the notation B A 
for all functions from A to B. Do you see why we don’t use B\ a \ instead? No, it’s not 
because B A is easier to write. It’s because B^ a does not specify the domain A. Instead, 
only its size |vl| is given. 


Example 11 (Using the notation) To get a feeling for the notation used to specify a 
function, it may be helpful to imagine that you have an envelope or box that contains a 
function. In other words, this envelope contains all the information needed to completely 
describe the function. Think about what you’re going to see when you open the envelope. 

You might see 

P = {a, b,c}, g : P -»• 4, g(a ) = 3, g(b) = 1 and g(c) = 4. 

This tells you that the name of the function is g , the domain of g is P, which is {a, b, c}, 
and the range of g is 4 = {1, 2, 3,4}. It also tells you the values in 4 that g assigns to each 
of the values in its domain. Someone else may have put 

g € 4{ a ’ b, c}, ordering: a, b, c, g = (3,1,4). 

in the envelope instead. This describes the same function. It doesn’t give a name for the 
domain, but that’s okay since all we need to know is what’s in the domain. On the other 
hand, it gives an order on the domain so that the function can be given in one-line form. 
Since the domain is ordered a, b, c and since g = (3,1,4), by the definition of one-line 
notation g{a) = 3, g(b) = 1 and g(c) = 4. Can you describe other possible envelopes for 
the same function? 

What if the envelope contained only g = (3,1,4)? If you think you have been given 
the one-line notation for g, you are mistaken. You must know the ordered domain of g 
before you can interpret g = (3,1,4). Here we don’t even know the domain as a set (or the 
range). The domain might by {a,b,c}, or {<,>,?}, or any other 3-set. 

What if the envelope contained 

the domain of g is {a, b, c}, ordering: a, b,c, g = (3,1,4)? 

We haven’t specified the range of g , but is it necessary since we know the values of the 
function? Our definition included the requirement that the range be specified, so this is 
not a complete definition. Some definitions of a function do not require that the range be 
specified. For such definitions, this would be a complete specification of the function g. Q 
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Example 12 (Counting functions) Think about specifying f : A —> B in one-line 
notation: (/(oq), 7 ( 02 ), • • •, f(a\A\))- There are \B\ ways to choose /(oq), |L>| ways to 
choose 7 ( 02 ), etc., and finally \B\ ways to choose f(a\A\)- This means that the cardinality 
of the set of all functions / : A —> B is In other words, \B A \ = 

We can represent a subset S of A by a unique function / : A —> 2 where 


/(*) 


1, if x ^ S, 

2, if x € S. 


This proves that there are 2^1 such subsets. We proved this result in Example 9. You 
should verify that this is essentially the same proof that was given there. 

We can represent a list of k elements of a set S with repetition allowed by a unique 
function / : k —> S. In this representation, the list corresponds to the function written in 
one-line notation. (Recall that the ordering on k is the numerical ordering.) This proves 
that there are exactly \S\ k such lists. □ 


Definition 5 (Types of functions) Let f : A B be a function. If for every b € B 
there is an a G A such that f(a ) = b, then f is called a surjection (or an onto function). 
Another way to describe a surjection is to say that it takes on each value in its range at 
least once. 

If f(x) = f(y) implies x = y, then f is called an injection (or a one-to-one function). 
Another way to describe an injection is to say that it takes on each value in its range at 
most once. 

If f is both an injection and a surjection, it is a called a bijection. The bijections of 
A a are called the permutations of A. The set of permutations on a set A is denoted in 
various ways. Two notations are PER(R) and S(A). 

If f : A —> B is a bijection, we may talk about the inverse bijection of f, written / -1 , 
which reverses what f does. Thus f~ l : B —> A and / _1 (6) is that unique a € A such that 
/(a) = b. 

Note that /(/ _1 (6)) = b and / _1 (/(a)) = a. Do not confuse / -1 with 1//. For example, 
if / : R —> R is given by f(x) = x 3 + 1, then 1 /f(x) = l/(x 3 + 1) and f~ 1 (x) = (x — l) 1 / 3 . 


Example 13 (Surjections, injections and bijections as lists) Lists provide another 
fundamental way to think about the various types of functions we’ve just defined. We’ll 
illustrate this with some examples. 

Let A = 4, B = {a, b , c, d, e} and / = (d, c, d, a) describe the function / in one-line 
notation. Since the list (d, c, d, a) contains d twice, / is not an injection. The function 
(6, d, c, e) € B A is an injection since there are no repeats in the list of values taken on by 
the function. The 4-lists without repeats that can be formed from B correspond to the 
injections from 4 to B. In general, the injections in S- correspond to /c-lists without repeats 
whose elements are taken from S. 

With the same / as in the previous paragraph, note that the value b is not taken on 
by /. Thus / is not a surjection. (We could have said e is not taken on, instead.) 
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Now let A = 4, B — {x, y, 2 } and 5 = ( x , y, x, z). Since every element of B appears at 
least once in the list of values taken on, g is a surjection. 

Finally, let A = B = 4 and h = (3,1,4, 2). The function is both an injection and 
a surjection. Hence, it is a bijection. Since the domain and range are the same and / 
is a bijection, it is a permutation of A = 4. The list (3,1,4, 2 ) is a rearrangement (a 
permutation) of the ordered listing (1,2, 3,4) of A. That’s why we call h a permutation. 
The inverse of h is (2,4,1, 3). □ 


Example 14 (Encryption) Suppose we want to send data (a text message, a JPEG 
hie, etc.) to someone and want to be sure no one else can read the data. Then we use 
encryption. We can describe encryption as a function / : D —»• R where D is the set of 
possible messages. Of course, there are a huge number of possible messages, so what do 
we do? We can break the message into pieces. For example, we could break an ordinary 
text message into pieces with one character (with the space as a character) per piece. Then 
apply a function / to each piece. Here’s a simple example: If x is a letter, f(x ) is the next 
letter in the alphabet with Z A and /(space) = space. Then we would encrypt “HELLO 
THERE” as “IFMMP UIFSF.” This is too simple for encryption. 

What can we do? Let S be the set of symbols that we are using (A to Z and space 
in the previous paragraph). We could choose a more complicated function / : S —> S than 
our simple function. What properties should it have? 

• It must have an inverse so that we can decrypt. 

• The encryption and decryption must be quick on a computer. 

• It must be hard for someone else to figure out / -1 . 

Since f : S —> S and it has an inverse, / must be a bijection (in fact, a permutation of 
S). How can we make / hard to figure out? That is a problem in the design of encryption 
systems. One key ingredient is to make S large. For example, in systems like PGP (Pretty 
Good Privacy) and DES (Data Encryption Standard) S consists of all n-long vectors of 
zeroes and ones, typically with n = 64. In this case \S\ = 2 64 ss 10 19 , which is quite 
large. □ 


Example 15 (Hashing) Hashing is a procedure for mapping a large space into a smaller 
one. For example, a hash function h may have as its domain all sequences of zeroes and 
ones of all possible lengths. It’s range might be all n-long sequences of zeroes and ones for 
some n. There are some publicly available hash functions h that seem to be good. 

Why would we want such a function? Suppose we want to be sure no one changes 
a document that is stored in a computer. We could apply h to the document and then 
save h(document). By giving h(document) to people, they could later check to see if the 
document had been changed — if the function h is well chosen it would be hard to change 
the document without changing the value of h, even if you know how to compute h. Suppose 
you email a document to a friend, but you’re concerned that someone may intercept the 
email and change the document. You can call up your friend and tell him h(document) so 
that he can check it. 

Another use for a hash function is storing data. Suppose we have an n-long array 
in which we want to store information about students at the university. We want a hash 
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function that maps student ID numbers into {1,2,... n}. Then /i(ID) tells us which array 
position to use. Of course two student ID numbers may hash to the same value (array 
position). There are methods for dealing with such conflicts. Q 


Example 16 (Two-line notation) Since one-line notation is a simple, brief way to specify 
functions, it is used frequently. If the domain is not a set of numbers, the notation is poor 
because we must first pause and order the domain. There are other ways to write functions 
which overcome this problem. For example, we could write /(a) = 4, f(b) = 3, /(c) = 4 
and f(d ) = 1. This could be shortened up somewhat to a —► 4, b —■> 3, c —► 4 and d —» 1. By 

turning each of these sideways, we can shorten it even more: ^ ^ ^ ^ ^ ^ . For obvious 

reasons, this is called two-line notation. Since x always appears directly over f(x), there 
is no need to order the domain; in fact, we need not even specify the domain separately 
since it is given by the top line. If the function is a bijection, its inverse is obtained by 
interchanging the top and bottom lines. 

The arrows we introduced in the last paragraph can be used to help visualize different 
properties of functions. Imagine that you’ve listed the elements of the domain A in one 
column and the elements of the range B in another column to the right of the domain. 
Draw an arrow from a to b if f(a) = b. Thus the heads of arrows are on elements of B and 
the tails are on elements of A. Since / is a function, no two arrows have the same tail. If 
/ is an injection, no two arrows have the same head. If / is a surjection, every element of 
B is on the head of some arrow. You should be able to describe the situation when / is a 
bijection. □ 


Example 17 (Compositions of functions) Suppose that / and g are two functions 
such that the values / takes on are contained in the domain of g. We can write this as 
/ : A —> B and g : C —► D where f(a ) € C for all a£i. We define the composition of g 
and /, written gf : A —> D by ( gf)(x ) = g(f(x)) for all x € A. The notation g o f is also 
used to denote composition. Suppose that / and g are given in two-line notation by 

f _(p q r s\ _fP Q R S T U V\ 

1 ~ \P R T U) 9 ~ 3 5 2 4 6 7 J' 

t ■»»/=(; i: ;)• 

Suppose / : A —* R, g : B —> C , and h : C —> D. We can form the compositions g o f 
and h o g\ however, we cannot form the composition h o f unless C contains f(x) for all 
x G A. We can also form the compositions of all three functions, namely h o (g o f ) and 
(hog) of. These two compositions are equal — that’s the “associative law” for composition 
of functions. How is it proved? Here’s an algebraic proof that uses nothing more than the 
definition of o at each step: For all x G A 

(ho(go f))(x) = h((gof)(x)) = h(g(f(x))) = (hog)(f(x)) = ((h o g) o f) (x). 

Let A be a set. Suppose that /,g G 5(H).; that is, / and g are permutations of a 
set A. Recall that a permutation is a bijection from a set to itself and so it makes sense 
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to talk about / -1 and fg. We claim that fg and / -1 are also permutations of A. This is 
easy to see if you write the permutations in two-line form and note that the second line is 
a rearrangement of the first if and only if the function is a permutation. 

Again suppose that / G 5(A). Instead of / o / or // we write f 2 . Note that f 2 {x) is 
not (/(x)) 2 . (In fact, if multiplication is not defined in A, ( f(x )) 2 has no meaning.) We 
could compose three copies of /. The result is written / 3 . In general, we can compose 
k copies of / to obtain f k . A cautious reader may be concerned that / o (/ o /) may not be 
the same as (/ o /) o /. By the associative law for o, they’re equal. In fact, f k+m = f k o f m 
for all nonnegative integers k and m, where f° is defined by /°(x) = x for all x in the 
domain. This is true even if k or m or both are negative. D 


Example 18 (Composing permutations) Let’s carry out some calculations for practice. 
Let / and g be the permutations 


(1 2 3 4 5\ = (1 2 3 4 5\ 

V 2 1 4 5 3 J 9 L 3 4 5 1 }■ 


To compute fg, we must calculate fg(x ) for all x. This can be done fairly easily from the 
two-line form: For example, {fg)( 1) can be found by noting that the image of 1 under g is 
2 and the image of 2 under / is 1. Thus (fg)( 1) = 1. You should be able to verify that 


fg 


1 2 3 4 5\ 
1 4 5 3 2) 


gf 


1 2 
3 2 


3 4 

5 1 



+ fg- 


Thus, / o g = g o f (commutative law) is not a law for permutations. 

It is easy to get the inverse, simply interchange the two lines. Thus 

which is the same as / 

since the order of the columns in two-line form does not matter. 

Let’s compute some powers: 

f 2 =( l 2 3 4 5\ /3 = (1 2 3 4 5) 5 6 = /l 2 3 4 5\ 

■' \ 1 2 5 3 4J 3 ^2 1 3 4 5 y y J ^1 2 3 4 5 )' 


(1 2 3 4 5\ 

y2 1 5 3 4y ’ 


f 2 1 4 5 3 \ 

(12345/ 


We computed / c using / 6 = / 3 o / 3 . That was a bit tedious. Now imagine if you wanted 
to compute / 10 °. Cycle notation is an easy way to do that. □ 


Let / be a permutation of the set A and let n = |A|. If x G A, we can look at the 
sequence x, f(x), /(/(x)),..., / fc (x),..., which is often written as x —► /(x) —> /(/(x)) —> 

• • • —► f k (x) —> _ Using the fact that /°(x) = x, we can write the sequence as /°(x) —> 

/ x (x) -4- / 2 (x) —>•••. Since the range of / has n elements, this sequence will contain a 
repeated element in the first n + 1 entries. Suppose that / s (x) is the first sequence entry 
that is ever repeated and that f p (x) is the first time that it is repeated. 
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We claim that s = 0. If -s > 0, apply / -1 to both sides of this equality to obtain 
/ s-1 (x) = f p ~ 1 (x), contradicting the fact that s was chosen as small as possible. Thus, in 
fact, s = 0. 

It follows that the sequence cycles through a pattern of length p forever since f p+1 (x ) = 
f(f p (x)) = f(x), f p+2 {x ) = f 2 (f p (x)) = f 2 (x), and so on. We call (x, f(x ),..., / p_1 (x)) 
the cycle containing x and call p the length of the cycle. If a cycle has length p. we call it 
a p-cycle. Cyclic shifts of a cycle are considered the same; for example, if (1,2,6,3) is the 
cycle containing 1 (as well as 2, 3 and 6), then (2,6,3,1), (6,3,1,2) and (3,1,2,6) are other 
ways of writing the cycle. 

Suppose (xi,X 2 , ■ ■ ■ ,x p ) is a cycle of / and that y\ G A is not in that cycle. We can 
form the cycle containing y \: (y \, y< 2 , ■ ■ ■ ,y q ). None of the yk is in the cycle (aq,... ,x p ). 
Why? If it were, we could continue in the cycle and eventually reach y-\. Written out 
algebraically: If yk = Xj for some k and j, then y\ = f q ~ k ~ 1 (yk ) = f q k l ( x j) and the 
right side is in the cycle (aq,..., x p ). We have proved. 


Theorem 4 (Cycle form of a permutation) Let f be a permutation of the finite set 
A be a finite set. Every element of A belongs to a cycle of f. Two cycles are either the 
same or have no elements in common. 


Example 19 (Using cycle notation) Consider the permutation 

,_/l 2345678 9\ 

' “^248159376 ) ' 

Since 1 —> 2 —> 4 —> 1, the cycle containing 1 is (1,2,4). We could equally well write it 
(2,4,1) or (4,1,2); however, (1,4,2) is different since it corresponds to 1 —> 4 —» 2 —» 1. The 
usual convention is to list the cycle starting with its smallest element. The cycles of / are 
(1,2,4), (3,8,7), (5) and (6,9). We write / in cycle form as 

/ = (1,2,4) (3,8,7) (5) (6,9). 

The order in which the cycles are written doesn’t matter, so we have 

/ = (5) (6,9) (1,2,4) (3,8,7) and / = (4,1,2) (5) (6,9) (7,3,8), 

and lots of other equivalent forms. It is common practice to omit the cycles of length one 
and write / = (1, 2,4)(3, 8, 7)(6, 9). The inverse of / is obtained by reading the cycles 
backwards because f~ 1 (x) is the lefthand neighbor of x in a cycle. Thus 

r 1 = (4,2,1) (7,8,3) (9,6) = (1,4,2)(3,7,8)(6,9). 


To compute f(x), we simply take one step to the right from x in its cycle. We just saw 
that / -1 (x) is computed by taking one step to the left. You may be able to guess at this 
point that f k (x ) is computed by taking k steps to the right, with the rule that a negative 
step to the right is the same as a positive step to the left. This makes it easy to compute 
powers. 
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When / = (1,2,4)(3,8, 7)(5)(6,9), what is / 10 °? Imagine starting at 1. After 3 steps 
to the right we’re back at 1. Do this 33 times so that after 3 x 33 = 99 steps to the right 
we’re back at 1. One more step takes us to 100 steps and so / 100 (1) = 2. You should be 
able to figure out the rest: / 10 ° = (1,2,4)(3,8, 7)(5)(6)(9). □ 

We next take a close look at the notions of image and coimage of a function. Again, 
let / : A —> B be a function. The image of / is the set of values / actually takes on: 
Image(/) = { /(a) | a G A }. The definition of a surjection can be rewritten Image(/) = B. 

For each b G B, the inverse image of b , written / _1 (6) is the set of those elements in A 
whose image is b ; i.e., 

/ _1 (6) = { a | a G A and /(a) =b }. 

This extends our earlier definition of / -1 from bijections to all functions; however, such 
an f~ l can’t be thought of as a function from B to A unless / is a bijection because it 
will not give a unique a € A for each b G B. (There is a slight abuse of notation here: 
If / : A —► B is a bijection, our new notation is / _1 (6) = {a} and our old notation is 
f~ 1 (b) = a.) 

Definition 6 (Coimage) Let f : A —> B be a function. The collection of nonempty 
inverse images of elements of B is called the coimage of f. In set-theoretic terms, 

Coimage(f) = {/ _1 (6) | b e B, / _1 (6) + = {/ _1 ( ft ) I b e Image(/)}. 

To describe the structure of coimages, we need to recall that a partition of a set is 
an unordered collection of nonempty subsets of B such that each element of B appears in 
exactly one subset. Each subset is called a block of the partition. 

Theorem 5 (Structure of coimage) Suppose f : A —* B. The coimage of f is the 
partition of A whose blocks are the maximal subsets of A on which f is constant. 

Wait — before we give a proof we need to understand what we just said. Let’s look at an 
example. If / G {a, b, c}- is given in one-line form as (a, c, a, a, c), then 

Coimage(/) = {/ _1 (o),/ _1 (c)} = {{1,3,4}, {2,5}}, 

f is a on {1,3,4} and is c on {2,5}. Now let’s prove the theorem. 

Proof: If x G A, let y = f(x). Then x G f~ 1 {y) and so the union of the nonempty inverse 
images contains A. Clearly it does not contain anything which is not in A. If y\ ^ y 2 , 
then we cannot have x G f~ 1 {yi) and x G f~ 1 {y 2 ) because this would imply f(x) = y\ and 
f(x) = 2 / 2 , a contradiction of the definition of a function. Thus Coimage(/) is a partition 
of A. Clearly x\ and X 2 belong to the same block if and only if f(x i) = f(x 2 ). Hence a 
block is a maximal set on which / is constant. D 

Since Coimage(/) is a partition of the domain A, we need to review the basic combi¬ 
natorial properties of partitions. 
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Example 20 (Set partitions) The 15 partitions of {1,2,3,4}, classified by number of 
blocks, are 


1 block: 

{1,2,3,4} 




2 blocks: 

{{1,2,3},{4}} 

{{1,2,4}, {3}} 

{{1,2},{3,4}} 

{{1,3,4},{2}} 


{{1,3},{2,4}} 

{{1,4},{2,3}} 

{{1},{2,3,4}} 


3 blocks: 

{{1,2},{3},{4}} 

{{1,3},{2},{4}} 

{{1,4},{2},{3}} 

co 

T—1 


{{1},{2,4},{3}} 

CnT 

T—1 



4 blocks: 

{{1},{2},{3},{4}} 




Let S(n, k ) be the number of partitions of an n-set having exactly k blocks. These are 
called Stirling numbers of the second kind. Do not confuse S(n,k ) with C(n,k ) = (?). In 
both cases we have an n-set. For C(n, k ) we want to choose a subset containing k elements 
and for S(n, k) we want to partition the set into k blocks. 

What is the value of S(n, k)? Let’s try to get a recursion. How can we build partitions 
of {1, 2,..., n} with k blocks out of smaller cases? If we take partitions of {1,2,... ,n — 1} 
with k — 1 blocks, we can simply add the block {?r}. If we take partitions of {1, 2,..., n — 1} 
with k blocks, we can add the element n to one of the k blocks. You should convince yourself 
that all k block partitions of {1, 2,... , n} arise in exactly one way when we do this. This 
gives us a recursion for S(n, k). Putting n in a block by itself contributes S(n — 1, k — 1). 
Putting n in a block with other elements contributes S{n — 1, k) x k Thus, 

S(n, k) = S(n — 1, k — 1) + k S(n — 1, k ). 

Below is the tabular form for S(n, k ) analogous to the similar tabular form for C(n, k). 



Notice that the starting conditions for this table are that 5(n, 1) = 1 for all n > 1 and 
S(n,n ) = 1 for all n > 1. The values for n = 7 are omitted from the table. You should 
fill them in to test your understanding of this computational process. For each n, the total 
number of partitions of a set of size n is equal to the sum S(n, 1) + S(n, 2) + • • • + S(n, n ). 
These numbers, gotten by summing the entries in the rows of the above table, are the Bell 
numbers, B(n), that we discussed in Section 1. □ 
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Example 21 (Counting functions by image size) Suppose A and B are sets. Let 
|A| = m and \B\ = n. Suppose k < m and k < n. A basic question about functions 
/ : A —> B is the following: 

Let S = {/ | f : A B, |Image(/)| = k}. Find \S\. 

In other words, there are exactly k blocks in the coimage of /. This question clearly involves 
the Stirling numbers. In fact, the answer is |S| = ( T)S(m,k ) k\. The idea is to choose the 
image of the function in Q'j ways, then choose the coimage of the function in S(m, k ) ways 
and then put them together in k\ ways. You will get a chance to fill in the details in the 
last two exercises below. Here is an example. Suppose we take |A| = 4, \B\ = 5, and k = 3. 
We get \S\ = (®)S(4,3) 3! = 10 x 6 x 6 = 360. 

Let’s look at some special cases. 

• If k = \B\, then we are counting surjections. Why? We are given |Image(/)| = \B\. 
Thus every element in B must be in Image(/). 

• If k = |A|, then we are counting injections. Why? Suppose / is not an injection, say 
f(a) = f(b) for some a ^ b. Then / can take on at most |A| — 1 different values. But 
k = |A| says that |Image(/)| = |A|, a contradiction. □ 


Exercises for Section 2 

2.1. Let R be the relation on 1x1, the Cartesian plane, defined by xRy if y = x 2 . 
Sketch a picture that represents the set R in K x M. 

2.2. Let R be a relation from the power set T(X) to itself, where X = {1,2,3,4}, 
defined by A R B if A n B / 0. 

(a) Is ARA for all A e V{X)7 

(b) For any A, B € V{X), if ARB is B RA1 

(c) For any A, B € V(X), if ARB and B RC, is ARC? 

2.3. In each case, draw the “directed graph diagram” of the given relation (label points 
in your diagram with the elements of S, put an arrow from x to y if and only if 
(x,y) belongs to the relation). 

(a) S = {(a, 6), (a, c), (6, c), (d, d )} on X x X, X = {a, b, c, d}. 

(b) Let X = {2, 3,4, 5, 6, 7, 8} and define x Ry if x = y (mod 3); that is, if \x — y\ 
is divisible by 3. 

2.4. Find all relations on {a, b} x {x,y}, that are not functional. 

2.5. Let S be the divides relation on {3,4, 5} x {4,5,6}; that is, x S y if y/x is an integer. 
List the elements of S and S’ -1 . 

2.6. Let A be a set with m elements and B be a set with n elements. 
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(a) How many relations are there on A x B ? 

(b) How many functions are there from A to B1 

2.7. Define a binary relation D from 10 = {1, 2,3,4,5, 6 , 7, 8 , 9,10} to 10 as follows: For 
all x, y , in HI, x D y if x < y and x divides y. How many edges are there in the 
directed graph of this relation. Explain. 

2.8. This exercise lets you check your understanding of the definitions. In each case 
below, some information about a function is given to you. Answer the following 
questions and give reasons for your answers: 

• Have you been given enough information to specify the function? 

• Can you tell whether or not the function is an injection? a surjection? a bi- 
jection? 

• If possible, give the function in two-line form. 

(a) / = (3,1,2,3). 

(b) / e {>,<,+,?}-, / = (?,<,+). 

(c) / G #, 2 —> 3, 1 —► 4, 3 - 2. 

2.9. This exercise lets you check your understanding of the definitions. In each case 
below, some information about a function is given to you. Answer the following 
questions and give reasons for your answers: 

• Have you been given enough information to specify the function? 

• Can you tell whether or not the function is an injection? a surjection? a 
bijection? If so, what is it? 


(a) / G #, 

Coimage(/) = {{1, 3, 5}, {2,4}}. 

(b) / G 5-, 

Coimage(/) = {{1}, {2}, {3}, {4}, {5}} 

(c) / G #, 

/- 1 (2) = {1,3,5}, r 1 (4) = {2,4}. 

(d) / G #, 

|lmage(/)| = 4. 

(e) / G #, 

|lmage(/)| = 5. 

(f) / e #, 

|Coimage(/)| = 5. 


2.10. For each of the following definitions, state whether the definition is correct or not 
correct. If not correct, explain why. 

(a) Definition: f : A—+ B is one-to-one if Vs, t G A, f(s) = f(t ) implies s = t. 

(b) Definition: / : A — > B is one-to-one if Vs, t € A, s ^ t implies f(s) ^ f(t). 

(c) Definition: / : A — > B is one-to-one if Vs € A, 3 ! t € B such that /(-s) = t. 
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(d) Definition: / : A —> B is one-to-one if \/t € B, 3 at most one s 6 A such that 
f(s) = t. 

2.11. Define g : Z —► Z by g(n ) = 3n — 1, where Z is the set of integers. 

(a) Is g one-to-one? 

(b) Is g onto? 

(c) Suppose that g : R —> R and g{x) = 3x — 1 for all real numbers x. Is g onto? 

2.12. In each case prove or disprove the statement “/ is one-to-one.” 

(a) f(x) = where / : K —► M 

(b) /( x) = where f : (R — {0}) —> K 

(c) f(x) = where / : (M - {-1}) M 

2.13. This exercise lets you check your understanding of cycle form. A permutation is 
given in one-line, two-line or cycle form. Convert it to the other two forms. Give 
its inverse in all three forms. 

(a) (1,5,7,8) (2,3) (4) (6). 

,, /12345678\ 
b y8 3 7 2 6 4 5 1) 

(c) (5,4,3,2,1), which is in one-line form. 

(d) (5,4,3,2,1), which is in cycle form. (Assume the domain is 5.) 

2.14. Let S = {/ | / : A —> B, f one-to-one}. In each case, find |S|. 

(a) \A\ = 3 and \B\ = 3 

(b) |A| = 3 and \B\ = 5 

(c) |A| = m and \B\ = n 

2.15. Let / : X -*■ Y, g : Y -*• Z and g o f : X -»• Z. 

(a) If g o / is onto, must / and g be onto? 

(b) If g o / is one-to-one, must / and g be one-to-one? 

2.16. In each case, / : X —> Y is an arbitrary function. Prove or disprove: 

(a) For all subsets A and B of A, f(AuB) = f(A) U f{B). 

(b) For all subsets A and B of X , f(A n B) = f(A) fl f(B). 

(c) For all subsets A and B of A, f(A — B) = f(A) — f(B). 
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(d) For all subsets C and D of Y. f 1 (C n D) = / X (C) n f 1 {D). 

2.17. Let f : X —> Y, g : Y —> Z and g o f : X —» Z. Prove or disprove: 

(a) For all subsets A Cl, f~ 1 {f(,A)) = A. 

(b) For all subsets B C Y, )) = B. 

(c) For all subsets EC2, (go f)~ 1 {E)) = / -1 (g -1 (F)). 

2.18. Let S' = {/ | / : A —> B, f onto}. In each case, find |S|. Do (a)-(d) without using 
the general formula in the text. 

(a) \A\ = 3 and \B\ = 2 

(b) |yl| = 3 and \B\ = 5 

(c) |yl| = 4 and \B\ = 2 

(d) |yl| = m and |L>| = n 

(e) Explain how to use the general formula in the text to solve (d). 

2.19. Let S = {/ | f : A —* By |Image(/)| = k}. Suppose that \A\ = m > k and 
\B\ = n > k. Prove the formula for |S| given in the text. 
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Multiple Choice Questions for Review 

In each case there is one correct answer (given at the end of the problem set). Try 
to work the problem first without looking at the answer. Understand both why the 
correct answer is correct and why the other answers are wrong. 

1. Which of the following statements is FALSE? 

(a) 2 G A U B implies that if 2 ^ A then 2 G B. 

(b) {2,3} C A implies that 2 G A and 3 € A. 

(c) Ad B D {2,3} implies that {2, 3)Ci and {2, 3} C B. 

(d) A — B D {3} and {2} C B implies that {2,3} C A U B. 

(e) {2} G A and {3} G A implies that {2, 3} C A. 

2. Let A = {0,1} x {0,1} and B = {a,b,c}. Suppose A is listed in lexicographic order 
based on 0 < 1 and B is in alphabetic order. If A x B x A is listed in lexicographic 
order, then the next element after ((1,0), c, (1,1)) is 

(a) ((1,0), a, (0,0)) 

(b) ((1,1), c, (0, 0)) 

(c) ((1,1), a, (0,0)) 

(d) ((1,1), a, (1,1)) 

(e) ((1,1),6,(1,1)) 

3. Which of the following statements is TRUE? 

(a) For all sets A, B , and C , A — (B — C) = (A — B) — C. 

(b) For all sets A, B, and C , (A — B) fl (C — B) = (A fl C) — B. 

(c) For all sets A, B , and (7, (A — B) fl (C — B) = A — (B U C). 

(d) For all sets A, B, and C, ii AnC = B nC then A = B. 

(e) For all sets A, B, and C, if AuC = BuC then A = B. 

4. Which of the following statements is FALSE? 

(a) C — (B\J A) = (C — B) — A 

(b) A-(CUB) = (A-B)-C 

(c) B - (A U C) = (B - C) - A 

(d) A - (BUG) = (B-C) - A 

(e) A - (B U C) = (A - C) - B 

5. Consider the true theorem, “For all sets A and B, if A C B then A fl B c = 0.” Which 
of the following statements is NOT equivalent to this statement: 

(a) For all sets A c and B, if A C B then A c fl B c = 0. 

(b) For all sets A and B, if A c C B then A c n B c = 0. 
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(c) For all sets A c and B c , if A C B c then A fl B = 0. 

(d) For all sets A c and B c , if A c C B c then A c n B = 0. 

(e) For all sets A and B, if A c D B then A fl B = 0. 

6. The power set V((A x B) U (B x A)) has the same number of elements as the power 
set V((A x B) U {A x B)) if and only if 

(a) A = B 

(b) A = 0 or B = 0 

(c) B = 0 or A = B 

(d) A = 0 or B = 0 or A = B 

(e) A = 0 or B = 0 or A n B = 0 

7. Let cr = 452631 be a permutation on {1, 2, 3,4, 5, 6} in one-line notation (based on the 
usual order on integers). Which of the following is NOT a correct cycle notation for 

< 7 ? 

(a) (614)(532) 

(b) (461)(352) 

(c) (253)(146) 

(d) (325)(614) 

(e) (614)(253) 

8. Let / : X —> Y. Consider the statement, “For all subsets C and D of Y, f~ 1 (CnD c ) = 
/ -1 (C) H [/ -1 (.D)] C . This statement is 

(a) True and equivalent to: 

For all subsets C and D of Y. f~ x {C — D) = / _1 (C’) — f~ 1 (D). 

(b) False and equivalent to: 

For all subsets C and D of Y. / _1 (C' — D) = / _1 (C') — f~ 1 {D). 

(c) True and equivalent to: 

For all subsets C and D of Y, f~\C - D) = f-\C) - [f-\D)} c . 

(d) False and equivalent to: 

For all subsets C and D of Y. f~\C - D) = f-\C ) - [ f~ l {D)] c . 

(e) True and equivalent to: 

For all subsets C and D of Y, f~\C - D) = [/ _1 (C)] c - f~\D). 

9. Define f{n) = f + 1 ~^ 1) for all n G Z. Thus, / : Z —> Z, Z the set of all integers. 
Which is correct? 

(a) / is not a function from Z —> Z because ^ ^ Z. 

(b) / is a function and is onto and one-to-one. 

(c) / is a function and is not onto but is one-to-one. 

(d) / is a function and is not onto and not one-to-one 
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(e) / is a function and is onto but not one-to-one. 

10. The number of partitions of {1,2,3,4,5} into three blocks is 5(5,3) = 25. The total 
number of functions / : {1, 2, 3,4, 5} —► {1, 2, 3,4} with |Image(f)| = 3 is 

(a) 4x6 

(b) 4 x 25 

(c) 25 x 6 

(d) 4 x 25 x 6 

(e) 3 x 25 x 6 

11. Let f : X —> Y and g : Y —> Z. Let h = gof:X—>Z. Suppose g is one-to-one and 
onto. Which of the following is FALSE? 

(a) If / is one-to-one then h is one-to-one and onto. 

(b) If / is not onto then h is not onto. 

(c) If / is not one-to-one then h is not one-to-one. 

(d) If / is one-to-one then h is one-to-one. 

(e) If / is onto then h is onto. 

12. Which of the following statements is FALSE? 

(a) {2,3,4} C A implies that 2 € A and {3,4} C A. 

(b) {2,3,4} £ A and {2,3} € B implies that {4} C A — B. 

(c) A fl B D {2,3,4} implies that {2,3,4} C A and {2, 3,4} C B. 

(d) A — B D {3,4} and {1, 2} C B implies that {1,2,3,4} C A U B. 

(e) {2,3} C AuB implies that if {2, 3} fl A = 0 then {2,3} C B. 

13. Let A = {0,1} x {0,1} x {0,1} and B = {a,b,c} x {a,b,c} x {a,b,c}. Suppose A 
is listed in lexicographic order based on 0 < 1 and B is listed in lexicographic order 
based on a < b < c. If A x B x A is listed in lexicographic order, then the next element 
after ((0,1,1), (c, c, c), (1,1,1)) is 

(a) ((1,0,1), (a, a, 6), (0,0,0)) 

(b) ((1,0,0), (6, a, a), (0,0,0)) 

(c) ((1,0,0), (a, a, a), (0,0,1)) 

(d) ((1,0,0), (a, a, a), (1,0,0)) 

(e) ((1,0,0), (a, a, a), (0,0,0)) 

14. Consider the true theorem, “For all sets A, B , and C if A C B C C then C c C B c C 
A c .” Which of the following statements is NOT equivalent to this statement: 

(a) For all sets A c , B c , and C c , if A c C B c C C c then C C B C A. 

(b) For all sets A c , B, and C c , if A c C B C C c then C C B c C A. 

(c) For all sets A, B, and C c , if A c C B C C then C c C B c C A. 
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(d) For all sets A c , B , and C" 2 , if A c C B c C C then C c C B c C A. 

(e) For all sets A c , B c , and C c , if A c C B c C C then C c C B C A. 

15. Let V{A) denote the power set of A. If P(A) C B then 

(a) 2^ a \ < |B| 

(b) 2l A l > \B\ 

(c) 2|A| < \B\ 

(d) |A|+2< \B\ 

(e) 2l A l > 2l s l 

16. Let / : {1, 2,3,4,5,6, 7,8,9} —> {a, b, c, d, e}. In one-line notation, / = (e, a , 6, b , a, c, c, a, c) 
(use number order on the domain). Which is correct? 

(a) Image(/) = {a, b, c, d, e}. Coininge(/) = {{6, 7, 9}, {2, 5, 8}, {3,4}, {1}} 

(b) Irnage(/) = {a, b, c, e}, Coimage(/) = {{6, 7, 9}, {2, 5, 8}, {3,4}} 

(c) Image(/) = {a, b, c , e}, Coimage(/) = {{6, 7, 9}, {2, 5, 8}, {3,4}, {1}} 

(d) Irnage(/) = {a, b, c , e}, Coimage(/) = {{6, 7, 9, 2, 5, 8}, {3,4}, {1}} 

(e) Image(/) = {a, b, c , d, e}, Coimage(/) = {{1}, {3,4}, {2, 5, 8}, {6, 7, 9}} 

17. Let £ = {cr, y} be an alphabet. The strings of length seven over £ are listed in 
dictionary (lex) order. What is the first string after xxxxyxx that is a palindrome 
(same read forwards and backwards)? 

(a) xxxxyxy (b) xxxyxxx (c) xxyxyxx (d) xxyyyxx (e) xyxxxyx 

18. Let a = 681235947 and r = 627184593 be permutations on {1, 2, 3,4, 5, 6, 7, 8, 9} in 
one-line notation (based on the usual order on integers). Which of the following is a 
correct cycle notation for r o < 7 ? 

(a) (124957368) 

(b) (142597368) 

(c) (142953768) 

(d) (142957368) 

(e) (142957386) 

Answers: 1 (e), 2 (c), 3 (b), 4 (d), 5 (a), 6 (d), 7 (b), 8 (a), 9 (e), 10 (d), 11 (a), 

12 (b), 13 (e), 14 (d), 15 (a), 16 (c), 17 (b), 18 (d). 
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We understand a “set’’ to be any collection M of certain distinct objects 
of our thought or intuition (called the “elements” of A/) into a whole. 
(Georg Cantor, 1895) 

In mathematics you don’t, understand things. You just get used to them. 
(Attributed to John von Neumann) 

In this chapter, we define sets, functions, and relations and discuss some of 
their general properties. This material can be referred back to as needed in the 
subsequent chapters. 


1.2 Sets 

% 

A set is a collection of objects, called the elements or members of the set. The 
objects could be anything (planets, squirrels, characters in Shakespeare's plays, or 
other sets) but for us they will be mathematical objects such as numbers, or sets 
of numbers. We write 16 X if x is an element of the set X and x X if x is not 

an element of X. 

0 

If the definition of a “set” as a ‘'collection'’ seems circular, that’s becausc.it 
is. Conceiving of many objects as a single whole is a basic intuition that cannot 
be analyzed further, and the the notions of “set” and “membership” are primitive 
ones. These notions can be made mathematically precise by introducing a system 
of axioms for sets and membership that agrees with our intuition and proving other 
set-theoretic properties from the axioms. 

The most commonly used axioms for sets are t he ZFC axioms, named somewhat 
inconsistently after two of their founders (Zennelo and Fraenkel) and one of their 
axioms (the Axiom of Choice). We won’t state these axioms here; instead, we use 
“naive” set theory, based on the intuitive properties of sets. Nevertheless, all the 
set-theory arguments we use can be rigorously formalized within the ZFC system. 
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Sets are determined entirely by their elements. Thus, the sets X, Y are equal, 
written X = Y, if 

x € X if and only if x £ Y. 

It is convenient to define the empty set, denoted by 0, as the set with no elements. 
(Since sets are determined by their elements, there is only one set with no elements!) 
If X 0, meaning that X has at least one element, then we say that X is non¬ 
empty. 

We can define a finite set by listing its elements (between curly brackets). For 
example, 

X = {2,3,5,7,11} 

is a set with five elements. The order in which the elements are listed or repetitions 
of the same element are irrelevant. Alternatively, we can define X as the set whose 
elements are the first five prime numbers. It doesn’t matter how we specify the 
elements of X, only that they are the same. 

Infinite sets can’t be defined by explicitly listing all of their elements. Never¬ 
theless, we will adopt a realist (or “platonist”) approach towards arbitrary infinite 
sets and regard them as well-defined totalities. In constructive mathematics and 
computer science, one may be interested only in sets that can be defined by a rule or 
algorithm — for example, the set of all prime numbers — rather than by infinitely 
many arbitrary specifications, and there are some mathematicians who consider 
infinite sets to be meaningless without some way of constructing them. Similar 
issues arise with the notion of arbitrary subsets, functions, and relations. 

1.1.1. Numbers. The infinite sets we use are derived from the natural and real 
numbers, about which we have a direct intuitive understanding. 

Our understanding of the natural numbers 1,2,3,... derives from counting. 
We denote the set of natural numbers by 

N={1,2,3,...}. 

We define N so that it starts at 1. In set theory and logic, the natural numbers 
are defined to start at zero, but we denote this set by No = {0,1,2,...}. Histori¬ 
cally, the number 0 was later addition to the number system, primarily by Indian 
mathematicians in the 5th century AD. The ancient Greek mathematicians, such 
as Euclid, defined a number as a multiplicity and didn’t consider 1 to be a number 
either. 

Our understanding of the real numbers derives from durations of time and 
lengths in space. We think of the real line, or continuum, as being composed of an 
(uncountably) infinite number of points, each of which corresponds to a real number, 
and denote the set of real numbers by R. There are philosophical questions, going 
back at least to Zeno’s paradoxes, about whether the continuum can be represented 
as a set of points, and a number of mathematicians have disputed this assumption 
or introduced alternative models of the continuum. There are, however, no known 
inconsistencies in treating R as a set of points, and since Cantor’s work it has been 
the dominant point of view in mathematics because of its precision, power, and 
simplicity. 
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We denote the set of (positive, negative and zero) integers by 
Z = {... ,-3,-2,-1,0,1,2,3,...}, 
and the set of rational numbers (ratios of integers) by 

Q = {p/q :p,q€% and q ^ 0}. 

The letter “Z” comes from “zahl” (German for “number”) and “Q” comes from 
“quotient.” These number systems are discussed further in Chapter 2. 

Although we will not develop any complex analysis here, we occasionally make 
use of complex numbers. We denote the set of complex numbers by 

C = {a? + iy : x, y € K.} , 

where we add and multiply complex numbers in the natural way, with the additional 
identity that i 2 = — 1, meaning that i is a square root of —1. If z = x + iy € C, we 
call x = iRz the real part of 2 and y = 32 the imaginary part of z, and we call 

\z\ = 'Jx 1 + y 1 

the absolute value, or modulus, of 2 . Two complex numbers z = x + iy, w = u + iv 
are equal if and only if x = u and y = v. 

1.1.2. Subsets. A set A is a subset of a set X, written A c X or X D A, if 
every element of A belongs to X; that is, if 

x G A implies that x £ X. 

We also say that A is included in X. 1 For example, if P is the set of prime numbers, 
then P cN, and Ncl. The empty set 0 and the whole set X are subsets of any 
set X. Note that X = Y if and only if X C Y and Y C I; we often prove the 
equality of two sets by showing that each one includes the other. 

In our notation, A C X does not imply that A is a proper subset of X (that 
is, a subset of X not equal to X itself), and we may have A = X. This notation 
for non-strict inclusion is not universal; some authors use A C X to denote strict 
inclusion, in which A ^ X , and A C X to denote non-strict inclusion, in which 
A = X is allowed. 

Definition 1.1. The power set V{X) of a set X is the set of all subsets of X. 
Example 1.2. If X = {1,2,3}, then 

V(X) = {0, {1}, {2}, {3}, {2,3}, {1,3}, {1, 2}, {1,2,3}} . 

The power set of a finite set with n elements has 2" elements because, in 
defining a subset, we have two independent choices for each element (does it belong 
to the subset or not?). In Example 1.2, X has 3 elements and V(X) has 2 3 = 8 
elements. 

The power set of an infinite set, such as N, consists of all finite and infinite 
subsets and is infinite. We can define finite subsets of N, or subsets with finite 

^By contrast, we say that an element x G X is contained in X, in which cases the singleton set 
{#} is included in X. This terminological distinction is not universal, but it is almost always clear from 
the context whether one is referring to an element of a set or a subset of a set. In fact, before the 
development of the contemporary notation for set theory, Dedekind [3] used the same symbol (C) to 
denote both membership of elements and inclusion of subsets. 
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complements, by listing finitely many elements. Some infinite subsets, such as 
the set of primes or the set of squares, can be defined by giving a definite rule 
for membership. We imagine that a general subset A C N is “defined” by going 
through the elements of N one by one and deciding for each n € N whether n G A 
or n A. 

If X is a set and P is a property of elements of X, we denote the subset of X 
consisting of elements with the property P by {x € X : P{x)}. 

Example 1.3. The set 

{n € N : n = k 2 for some k € N} 
is the set of perfect squares {1,4,9,16, 25,... }. The set 

{i £ 1: 0 < I < 1} 

is the open interval (0,1). 

1.1.3. Set operations. The intersection A fl B of two sets A, B is the set of 
all elements that belong to both A and B ; that is 

x £ A fl B if and only if x £ A and x £ B. 

Two sets A, B are said to be disjoint if A fl B = 0 ; that is, if A and B have no 

elements in common. 

The union A U B is the set of all elements that belong to A or B: that is 
x £ A U B if and only if x £ A or x £ B. 

Note that we always use ‘or’ in an inclusive sense, so that x € A U B if x is an 

element of A or B, or both A and B. (Thus, A fl B C A U B.) 

The set-difference of two sets B and A is the set of elements of B that do not 
belong to A, 

B \ A = {x € B : x ^ A} . 

If we consider sets that are subsets of a fixed set X that is understood from the 
context, then we write A c = X\ A to denote the complement of A C X in X. Note 
that ( A c ) c = A. 

Example 1.4. If 

A ={2,3,5,7,11}, I? = {1,3,5, 7,9,11} 

then 

A n B = {3,5, 7, 11} , A U B = {1,2,3,5,7,9,11} . 

Thus, AnB consists of the natural numbers between 1 and 11 that are both prime 
and odd, while A U B consists of the numbers that are either prime or odd (or 
both). The set differences of these sets are 

B\A = { 1,9}, A\B = {2} . 

Thus, B \ A is the set of odd numbers between 1 and 11 that are not prime, and 
A \ B is the set of prime numbers that are not odd. 
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These set operations may be represented by Venn diagrams, which can be used 
to visualize their properties. In particular, if A, B C X , we have De Morgan’s laws: 

(A U B) c = A c n B c , {A n B) c = A c U B c . 

The definitions of union and intersection extend to larger collections of sets in 
a natural way. 

Definition 1.5. Let C be a collection of sets. Then the union of C is 

C = {x : x £ X for some X £ C}, 
and the intersection of C is 

P|C = {x : x £ X for every X £ C}. 

If C = { A , B}, then this definition reduces to our previous one for A U B and 
An B. 

The Cartesian product X x Y of sets X, Y is the set of all ordered pairs (x, y ) 
with x £ X and y £ Y. If X = Y, we often write X x X = X 2 . Two ordered 
pairs (£ 1 , 2 / 1 ), ( X 2 , 2 / 2 ) in X x Y are equal if and only if x\ = £2 and yi = 2 / 2 - Thus, 
(x, y) ^ ( y , x) unless x = y. This contrasts with sets where {x, y} = {y, x}. 

Example 1.6. If X = {1,2,3} and Y = {4, 5} then 

X x V = {(1,4), (1,5), (2,4), (2,5), (3,4), (3,5)} . 

Example 1.7. The Cartesian product of R with itself is the Cartesian plane R 2 
consisting of all points with coordinates ( x,y ) where x,y £ R. 

The Cartesian product of finitely many sets is defined analogously. 

Definition 1.8. The Cartesian products of n sets X x , X 2 ,-■ ■ ,X n is the set of 
ordered n-tuples, 

X 1 x X 2 x ■ ■ ■ x X n = {(£ 1 , £ 2 , • • •, x n ) : Xi £ Xi for * = 1, 2, ..., n} , 

where (x 1 ,x 2 ,... ,x n ) = ( 2 / 1 , 2 / 2 , • • •, Vn) if and only if x t = yi for every i = 
l,2,...,?i. 

1.2. Functions 

A function / : X —> Y between sets X , Y assigns to each x £ X a unique element 
f(x) £ Y. Functions are also called maps, mappings, or transformations. The set 
X on which / is defined is called the domain of / and the set Y in which it takes 
its values is called the codomain. We write / : £ 1 —>• f(x) to indicate that / is the 
function that maps x to f(x). 

Example 1.9. The identity function idx : X —> X on a set X is the function 
idx :£!—>•£ that maps every element to itself. 

Example 1.10. Let A C X. The characteristic (or indicator) function of A , 

XA : A -> {0,1}, 
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is defined by 


Xa{x) 


1 if x g A, 
0 if x £ A. 


Specifying the function \A is equivalent to specifying the subset A. 


Example 1.11. Let A , B be the sets in Example 1.4. We can define a function 
/ : A -A B by 

/(2) = 7, /(3) = 1, /(5) = 11, /(7) = 3, /(11) =9, 
and a function g : B — > A by 

ff (l) = 3, 5(3) = 7, 5(5) = 2, 5(7) = 2, g{9) = 5, 5 (H) = 11- 
Example 1.12. The square function / : N — > N is defined by 

/(n) = n 2 , 

which we also write as / : n 1 —> n 2 . The equation g(n) = y/n, where y/n is the 
positive square root, defines a function 5 : N —» R, but h{ri) = ±y/n does not define 
a function since it doesn’t specify a unique value for h{n). Sometimes we use a 
convenient oxymoron and refer to h as a multi-valued function. 


One way to specify a function is to explicitly list its values, as in Example 1.11. 
Another way is to give a definite rule, as in Example 1.12. If X is infinite and / is 
not given by a definite rule, then neither of these methods can be used to specify 
the function. Nevertheless, we suppose that a general function / : X -A Y may be 
“defined” by picking for each x G X a corresponding value f(x ) G Y. 

If / : X — > Y and U C X, then we denote the restriction of / to U by 
f\u : U -A Y, where f\ v {x) = f(x) for xGU. 

In defining a function / : X —> Y, it is crucial to specify the domain X of 
elements on which it is defined. There is more ambiguity about the choice of 
codomain, however, since we can extend the codomain to any set Z D Y and define 
a function 5 : X —> Z by g(x) = f(x). Strictly speaking, even though / and 5 
have exactly the same values, they are different functions since they have different 
codomains. Usually, however, we will ignore this distinction and regard / and 5 as 
being the same function. 

The graph of a function / : X —> Y is the subset G/ of X x Y defined by 
Gf = {( 2 :, y) G X xY : x G X and y = f{x)} . 

For example, if / : R —► R, then the graph of / is the usual set of points (x, y) with 
y = f(x) in the Cartesian plane M 2 . Since a function is defined at every point in 
its domain, there is some point (x,y) G Gf for every x G X, and since the value of 
a function is uniquely defined, there is exactly one such point. In other words, for 
each x G X the “vertical line” L x = {(x, y) G X x Y : y GY} through x intersects 
the graph of a function / : X —> Y in exactly one point: L x C\Gf = (x, f(x)). 

Definition 1.13. The range, or image, of a function / : X — > Y is the set of values 
ran / = {y G Y : y = f(x) for some x G X}. 

A function is onto if its range is all of Y ; that is, if 

for every y G Y there exists x G X such that y = f(x). 


179 



A function is one-to-one if it maps distinct elements of X to distinct elements of 
Y: that is, if 

£i ,#2 G X and x\ ^ X 2 implies that f(x 1 ) ^ f(x 2 ). 

An onto function is also called a surjection, a one-to-one function an injection, and 
a one-to-one, onto function a bijection. 

Example 1.14. The function / : A —>■ B defined in Example 1.11 is one-to-one but 
not onto, since 5 ^ ran/, while the function g : B —> A is onto but not one-to-one, 
since g( 5) = g( 7). 

1.3. Composition and inverses of functions 

The successive application of mappings leads to the notion of the composition of 
functions. 

Definition 1.15. The composition of functions / : X —» Y and g : Y —> Z is the 
function g o f : X —> Z defined by 

(9 0 f)( x ) =g(f{x))- 

The order of application of the functions in a composition is crucial and is read 
from from right to left. The composition go f can only be defined if the domain of 
g includes the range of /, and the existence of g o / does not imply that fog even 
makes sense. 

Example 1.16. Let X be the set of students in a class and / : X —> N the function 
that maps a student to her age. Let g : N —> N be the function that adds up the 
digits in a number e.g., g( 1729) = 19. It x £ X is 23 years old, then (g o f)(x) = 5, 
but (/o g)(x) makes no sense, since students in the class are not natural numbers. 

Even if both gof and fog are defined, they are, in general, different functions. 

Example 1.17. If / : A —> B and g : B —»• A are the functions in Example 1.11, 
then g o f : A —> A is given by 

( 9 o /)(2) = 2, (g o /)(3) = 3, (g o /)(5) = 11, 

(g°f)( 7) = 7, (g o /)(11) = 5. 
and / o g : B —> B is given by 

(/ 0 fiO(l) = 1) (f°9)( 3) = 3, (fog)( 5) = 7, 

(f°g)( 7) =7, (/ o <?)(9) = 11, (/ o S')(11) = 9. 

A one-to-one, onto function f : X Y has an inverse f~ 1 :Y^X defined by 
f~ 1 (y) = x if and only if f(x) = y. 

Equivalently, / -1 o / = idx and / o / -1 = idv- A value f~ 1 (y) is defined for every 
y GY since / is onto, and it is unique since / is one-to-one. If / : X —> Y is one- 
to-one but not onto, then one can still define an inverse function / -1 : ran / —> X 
whose domain in the range of /. 

The use of the notation / -1 to denote the inverse function should not be con¬ 
fused with its use to denote the reciprocal function; it should be clear from the 
context which meaning is intended. 
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Example 1.18. If / : R —>■ R is the function f(x) = x 3 , which is one-to-one and 
onto, then the inverse function / -1 : R —> R is given by 

r l (x) = x 1 / 3 . 

On the other hand, the reciprocal function g = 1// is given by 

9( x ) = ~ 3 > 5 : R \ {0} —» R. 

The reciprocal function is not dehned at x = 0 where /( x) = 0. 

If / : X —> Y and A <Z X, then we let 

f(A) = {y GY : y = f{x) for some x G A} 
denote the set of values of / on points in A. Similarly, if B C Y, we let 

r\B) = {xeX: f(x) G B} 

denote the set of points in X whose values belong to B. Note that f~ 1 (B) makes 
sense as a set even if the inverse function / -1 : Y —> X does not exist. 

Example 1.19. Define / : R —> R by f(x) = x 2 . If A = (—2, 2), then f(A) = [0,4). 
If B = (0,4), then 

/- 1 (B) = (-2,0)U(0,2). 

If C= (-4,0), then f~ 1 (C) = 0. 

Finally, we introduce operations on a set. 

Definition 1.20. A binary operation on a set X is a function / : X x X X. 

We think of / as “combining” two elements of X to give another element 
of X. One can also consider higher-order operations, such as ternary operations 
f : X x X x X ^ X, but will will only use binary operations. 

Example 1.21. Addition a:NxN->H and multiplication m : N x N —> N are 
binary operations on N where 

a(x, y) = x + y, m(x,y) = xy. 

1.4. Indexed sets 

We say that a set X is indexed by a set /, or X is an indexed set, if there is an 
onto function /:!->!. We then write 

X = {xi : i € 1} 

where Xi = f(i). For example, 

{1,4,9,16,... } = {?i 2 : n £ N} . 

The set X itself is the range of the indexing function /, and it doesn’t depend on 
how we index it. If / isn’t one-to-one, then some elements are repeated, but this 
doesn’t affect the definition of the set X. For example, 

{-1,1} = {(-l) ra :neN} = {(-l ) n+1 : n G N} . 
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If C = {Xi : i £ 1} is an indexed collection of sets X t . then we denote the union 
and intersection of the sets in C by 

(J Xi = {x : x G Xi for some i £ 1} , P| Xi = {x : x £ Xi for every i £ 1} , 
iei iei 

or similar notation. 

Example 1.22. For n £ N, define the intervals 

A n = [1/n, 1 — 1/n] = {x£ R: 1/n < x < 1 — 1/n}, 

B n = (—1/n, 1/n) = {i £ M : —1/n < x < 1/n}). 

Then 

oo oo 

U A n = y A n = (0,1), f| B n = p) B n = {0}. 

ngN n=l nGN n—1 

The general statement of De Morgan’s laws for a collection of sets is as follows. 

Proposition 1.23 (De Morgan). If {Xi C X : i £ 1} is a collection of subsets of 
a set X, then 

(u-v) =o?. (n*) =u^ 

Vie/ / iei \iei ) iei 


Proof. We have x i Ue/ Xi if and only if x ^ Xi for every i £ I, which holds 
if and only if x £ flie/ X/. Similarly, x ^ fW Xi if and only if x (/ Xi for some 
i £ I, which holds if and only ii x £ U ei X i- □ 


The following theorem summarizes how unions and intersections map under 
functions. 


Theorem 1.24. Let / : X —> Y be a function. If {Y) C Y : j £ .7} is a collection 
of subsets of Y, then 


r ' (y/-) 


ieJ 


r 1 



= n / 

ieJ 


and if {Xi C T : i £ 1} is a collection of subsets of X, then 



Proof. We prove only the results for the inverse image of a union and the image 
of an intersection; the proof of the remaining two results is similar. 

If x £ f ” 1 th en there exists y £ (J ;g j Yj such that f(x) = y. Then 

y £Yj for some j £ J and x £ f~ 1 (Yj), so x £ (J j eJ ,/ _1 C^j)- It follows that 

r 1 lb bU /-■«■)• 

\jeJ J jeJ 
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Conversely, if a: £ IW / 1 (Yj), then x £ / 1 (Y}) for some j £ J, so /(a;) £ Yj 
and f(x) £ UjeJ meaning that a; £ / _1 (^UjeJ ^a) • It follows that 

U r 1 «) c r 1 (u > ) ■ 

tea \iea / 

which proves that the sets are equal. 

If y £ / (P| igJ Xj), then there exists a: £ f) ig/ Xi such that /( x) = y. Then 
x £ Xi and y £ f(Xi) for every i £ I, meaning that y £ f) ieI f (Xi). It follows 
that 



The only case in which we don’t always have equality is for the image of an 
intersection, and we may get strict inclusion here if / is not one-to-one. 

Example 1.25. Define / : R —> R by f(x) = x 2 . Let A = (—1, 0) and B = (0,1). 
Then A fl B = 0 and f(A fi B) = 0 , but f(A) = f(B) = (0,1), so f(A) fl f(B) = 
(0,1) ^ f(AC\B). 

Next, we generalize the Cartesian product of finitely many sets to the product 
of possibly infinitely many sets. 

Definition 1.26. Let C = {Xi : i £ 1} be an indexed collection of sets Xi. The 
Cartesian product of C is the set of functions that assign to each index i £ / an 
element Xi £ Xi. That is, 

Xi = < / : I -> [J Xi : f(i) £ X, for every i £ I 
i&I l iel 

For example, if I = {1,2,..., n}, then / defines an ordered n-tuple of elements 
(x\,X 2 , • • •, x n ) with Xi = f(i) £ Xi, so this definition is equivalent to our previous 
one. 

If Xi = X for every i £ I, then IW Xi is simply the set of functions from I 
to X, and we also write it as 

X 1 = {f: I ^X}. 

We can think of this set as the set of ordered /-tuples of elements of X. 

Example 1.27. A sequence of real numbers (x\, X 2 , X 3 ,..., x n ,...) £ K N is a 
function / : N —> R. We study sequences and their convergence properties in 
Chapter 3. 

Example 1.28. Let 2 = {0,1} be a set with two elements. Then a subset A C / 
can be identified with its characteristic function \A : / —> 2 by: i £ A if and only 
if Xa(i) = 1. Thus, A \A is a one-to-one map from V(I) onto 2 J . 

Before giving another example, we introduce some convenient notation. 
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Definition 1.29. Let 


E = {(si, s 2 , s 3 ,..., s fe ,...) : s fc = 0,1} 

denote the set of all binary sequences; that is, sequences whose terms are either 0 
or 1 . 


Example 1.30. Let 2 = {0,1}. Then E = 2 N , where we identify a sequence 
(si,S 2 , ... Sfc,... ) with the function / : N —> 2 such that Sk = f(k). We can 
also identify E and 2 rj with 'P(N) as in Example 1.28. For example, the sequence 
(1, 0,1, 0,1,...) of alternating ones and zeros corresponds to the function / : N —> 2 
defined by 


m 


1 if k is odd, 
0 if k is even, 


and to the set {1,3,5, 7,... } C N of odd natural numbers. 


1.5. Relations 

A binary relation R on sets X and Y is a definite relation between elements of X 
and elements of Y. We write xRy if x £ X and y £ Y are related. One can also 
define relations on more than two sets, but we shall consider only binary relations 
and refer to them simply as relations. If X = Y, then we call R a relation on X. 

Example 1.31. Suppose that S' is a set of students enrolled in a university and B 
is a set of books in a library. We might define a relation R on S and B by: 

s £ S has read b £ B. 

In that case, sRb if and only if s has read b. Another, probably inequivalent, 
relation is: 

s £ S has checked b £ B out of the library. 

When used informally, relations may be ambiguous (did s read b if she only 
read the first page?), but in mathematical usage we always require that relations 
are definite, meaning that one and only one of the statements “these elements are 
related” or “these elements are not related” is true. 

The graph Gr of a relation R on X and Y is the subset of X x Y defined by 
Gr = {(*, y) £ X xY : xRy} . 

This graph contains all of the information about which elements are related. Con¬ 
versely, any subset G C XxY defines a relation R by: xRy if and only if (x, y) £ G. 
Thus, a relation on X and Y may be (and often is) defined as subset of X x Y. As 
for sets, it doesn’t matter how a relation is defined, only what elements are related. 

A function / : X —» Y determines a relation F on X and Y by: xFy if and 
only if y = f(x). Thus, functions are a special case of relations. The graph Gr of 
a general relation differs from the graph Gj? of a function in two ways: there may 
be elements x £ X such that (x,y) ^ Gr for any y £ Y, and there may be x £ X 
such that (x, y) £ Gr for many y £Y. 

For example, in the case of the relation R in Example 1.31, there may be some 
students who haven’t read any books, and there may be other students who have 
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read lots of books, in which case we don’t have a well-defined function from students 
to books. 

Two important types of relations are orders and equivalence relations, and we 
define them next. 

1.5.1. Orders. A primary example of an order is the standard order < on the 
natural (or real) numbers. This order is a linear or total order, meaning that two 
numbers are always comparable. Another example of an order is inclusion C on the 
power set of some set; one set is “smaller” than another set if it is included in it. 
This order is a partial order (provided the original set has at least two elements), 
meaning that two subsets need not be comparable. 

Example 1.32. Let A = {1, 2}. The collection of subsets of X is 

P(X) = {0,A,B,X}, A = {1}, B = { 2}. 

We have 0 C A C X and 0 C B C X, but A (£ B and B <£ A, so A and B are not 
comparable under ordering by inclusion. 

The general definition of an order is as follows. 

Definition 1.33. An order A on a set A is a binary relation on X such that for 
every x,y, z € X: 

(a) x A x (reflexivity); 

(b) if x Ay and y A x then x = y (antisymmetry); 

(c) if x Ay and y A z then x < z (transitivity). 

An order is a linear, or total, order if for every x, y £ X either x A y or y A x, 
otherwise it is a partial order. 

If A is an order, then we also write y >; x instead of x A y, and we define a 
corresponding strict order -< by 

x A y if x A y and x ^ y. 

There are many ways to order a given set (with two or more elements). 

Example 1.34. Let A be a set. One way to partially order the subsets of A is by 
inclusion, as in Example 1.32. Another way is to say that A A B for A, B C A if 
and only if A D B , meaning that A is “smaller” than B if A includes B. Then A 
in an order on 'P(A), called ordering by reverse inclusion. 

1.5.2. Equivalence relations. Equivalence relations decompose a set into dis¬ 
joint subsets, called equivalence classes. We begin with an example of an equivalence 
relation on N. 

Example 1.35. Fix N £ N and say that m ~ n if 

to = n (mod N), 

meaning that to — n is divisible by N. Two numbers are related by ~ if they have 
the same remainder when divided by N. Moreover, N is the union of N equivalence 
classes, consisting of numbers with remainders 0, 1,... N — 1 modulo N. 
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The definition of an equivalence relation differs from the definition of an order 
only by changing antisymmetry to symmetry, but order relations and equivalence 
relations have completely different properties. 

Definition 1.36. An equivalence relation ~ on a set A is a binary relation on A 
such that for every x,y, z € X: 

(a) x ~ x (reflexivity); 

(b) if x ~ y then y ~ x (symmetry); 

(c) if x ~ y and y ~ z then x ~ z (transitivity). 

For each x € X, the set of elements equivalent to x, 

[x/ ~] = {y £ A : x ~ y} , 

is called the equivalence class of x with respect to ~. When the equivalence relation 
is understood, we write the equivalence class [x/ ~] simply as [x]. The set of 
equivalence classes of an equivalence relation ~ on a set X is denoted by X/ ~. 
Note that each element of X/ ~ is a subset of X , so X/ ~ is a subset of the power 
set P(X) of X. 

The following theorem is the basic result about equivalence relations. It says 
that an equivalence relation on a set partitions the set into disjoint equivalence 
classes. 

Theorem 1.37. Let ~ be an equivalence relation on a set X. Every equivalence 
class is non-empty, and X is the disjoint union of the equivalence classes of ~. 

Proof. If x £ X, then the symmetry of ~ implies that x £ [x]. Therefore every 
equivalence class is non-empty and the union of the equivalence classes is X. 

To prove that the union is disjoint, we show that for every x, y £ X either 
[x] n [y\ = 0 (if x ^ y) or [x] = [y] (if x ~ y). 

Suppose that [x] fl [y] ^ 0 . Let z £[x\C I [y] be an element in both equivalence 
classes. If xi £ [x], then xi ~ z and z ~ y, so xi ~ y by the transitivity of ~, and 
therefore xi £ [y]. It follows that [x] C [y]. A similar argument applied to yi £ [y] 
implies that [y] C [x], and therefore [x] = [y]. In particular, y £ [x], so x ~ y. On 

the other hand, if [x] fl [y] = 0 , then y ^ [x] since y £ [y], so x / y. □ 

There is a natural projection ir : X —> X/ ~, given by 7r(x) = [x], that maps 
each element of X to the equivalence class that contains it. Conversely, we can 
index the collection of equivalence classes 

X/ ~ = {[a] : a £ A} 

by a subset A of A which contains exactly one element from each equivalence class. 
It is important to recognize, however, that such an indexing involves an arbitrary 
choice of a representative element from each equivalence class, and it is better 

to think in terms of the collection of equivalence classes, rather than a subset of 

elements. 

Example 1.38. The equivalence classes of N relative to the equivalence relation 
m ~ n if m = n (mod 3) are given by 

J 0 = {3,6,9,... }, Jr = {1,4,7,...}, J 2 = {2,5,8,...}. 


186 



The projection tt : N —> {Io,h,l 2 } maps a number to its equivalence class e.g. 
7 r(101) = I 2 • We can choose {1,2,3} as a set of representative elements, in which 
case 

lo = [3], Ji = [l], h = [2], 

but any other set A C N of three numbers with remainders 0, 1, 2 (mod 3) will do. 
For example, if we choose A = {7,15,101}, then 

h = [15], h = [7], h = [101]. 

1.6. Countable and uncountable sets 

One way to show that two sets have the same “size” is to pair off their elements. 
For example, if we can match up every left shoe in a closet with a right shoe, with 
no right shoes left over, then we know that we have the same number of left and 
right shoes. That is, we have the same number of left and right shoes if there is a 
one-to-one, onto map / : L —> R, or one-to-one correspondence, from the set L of 
left shoes to the set R of right shoes. 

We refer to the “size” of a set as measured by one-to-one correspondences as 
its cardinality. This notion enables us to compare the cardinality of both finite and 
infinite sets. In particular, we can use it to distinguish between “smaller” countably 
infinite sets, such as the integers or rational numbers, and “larger” uncountably 
infinite sets, such as the real numbers. 

Definition 1.39. Two sets X , Y have equal cardinality, written X sa Y, if there 
is a one-to-one, onto map / : X —> Y. The cardinality of X is less than or equal to 
the cardinality of Y. written X < Y, if there is a one-to-one (but not necessarily 
onto) map g : X —>■ Y. 

If X ss Y, then we also say that X, Y have the same cardinality. We don’t 
define the notion of a “cardinal number” here, only the relation between sets of 
“equal cardinality.” 

Note that ss is an equivalence relation on any collection of sets. In particular, 
it is transitive because if X ss Y and Y ss Z, then there are one-to-one and onto 
maps / : X —> Y and g : Y —> Z, so g o f : X —> Z is one-to-one and onto, and 
X ss Z. We may therefore divide any collection of sets into equivalence classes of 
sets with equal cardinality. 

It follows immediately from the definition that < is reflexive and transitive. 
Furthermore, as stated in the following Schroder-Bernstein theorem, if X < Y and 
Y < X, then X ss Y. This result allows us to prove that two sets have equal 
cardinality by constructing one-to-one maps that need not be onto. The statement 
of the theorem is intuitively obvious but the proof, while elementary, is surprisingly 
involved and can be omitted without loss of continuity. (We will only use the 
theorem once, in the proof of Theorem 5.67.) 

Theorem 1.40 (* Schroder-Bernstein). If X , Y are sets such that there are one- 
to-one maps / : X —> Y and g : Y —> X 1 then there is a one-to-one, onto map 
h:X^Y. 
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Proof. We divide X into three disjoint subsets Xx, Xy, Xoo with different map¬ 
ping properties as follows. 

Consider a point x± £ X. If x\ is not in the range of <?, then we say X\ £ Xx- 
Otherwise there exists y\ £ Y such that g(yi) = xi, and jq is unique since g is 
one-to-one. If y\ is not in the range of /, then we say x± £ Xy. Otherwise there 
exists a unique x^ £ X such that f(x 2 ) = y\- Continuing in this way, we generate 
a sequence of points 

*^1 5 Vli %2i 2/2? • • • 5 Urn *^n +15 • • • 

with x n £ X, y n G Y and 

g(y n ) = x n , f(x n + 1) = y n - 

We assign the starting point X\ to a subset in the following way: (a) x\ £ Xx if 
the sequence terminates at some x n £ X that isn’t in the range of g\ (b) X\ £ Xy 
if the sequence terminates at some y n £ Y that isn’t in the range of /; (c) X\ £ X ^ 
if the sequence never terminates. 

Similarly, if iq £ Y, then we generate a sequence of points 

2/l 5 ^ 17 2/2 5 %2-> • • • ? Vm 2/n+l > • • • 

with x n £ X, y n £ Y by 

/On) = y n , g(y n +i) = x n , 

and we assign jq to a subset Yx, Yy, or Y a c of Y as follows: (a) y\ £ Yx if the 
seciuence terminates at some x n £ X that isn’t in the range of g ; (b) jq £ Yy if the 
sequence terminates at some y n £ Y that isn’t in the range of /; (c) y\ £ Y a0 if the 
seciuence never terminates. 

We claim that / : Xx —> Yx is one-to-one and onto. First, if x £ Xx, then 
/0) 6 Yx because the the sequence generated by f(x) coincides with the sequence 
generated by x after its first term, so both sequences terminate at a point in X. 
Second, if y £ Yx, then there is x £ X such that f(x) = y, otherwise the sequence 
would terminate at y £ Y, meaning that y £ Yy. Furthermore, we must have 
x £ Xx because the sequence generated by a: is a continuation of the sequence 
generated by y and therefore also terminates at a point in X. Finally, / is one-to- 
one on Xx since / is one-to-one on X. 

The same argument applied to g : Yy —> Xy implies that g is one-to-one and 
onto, so g^ 1 : X Y Yy is one-to-one and onto. 

Finally, similar arguments show that / : X^ —> Y a 0 is one-to-one and onto: If 
x £ Xqo, th en the sequence generated by f(x) £ Y doesn’t terminate, so f(x) £ Y 0 c ; 
and every y £ Y 0Q is the image of a point x £ X which, like y, generates a sequence 
that does not terminate, so x £ X^. 

It then follows that h : X — »• Y defined by 


f / 0) 

if x £ Xx 


h (x) = S .9 _1 (aO 

if X £ Xy 


[fix) 

if x £ Xoo 


is a one-to-one, onto map from X to Y. 


□ 
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We can use the cardinality relation to describe the “size” of a set by comparing 
it with standard sets. 

Definition 1.41. A set X is: 

(1) Finite if it is the empty set or X ss {1, 2,..., n} for some neN; 

(2) Countably infinite (or denumerable) if X ss N; 

(3) Infinite if it is not finite; 

(4) Countable if it is finite or countably infinite; 

(5) Uncountable if it is not countable. 

We’ll take for granted some intuitively obvious facts which follow from the 
definitions. For example, a finite, non-empty set is in one-to-one correspondence 
with {1, 2 ,..., n} for a unique natural number n £ N (the number of elements in 
the set), a countably infinite set is not finite, and a subset of a countable set is 
countable. 

According to Definition 1.41, we may divide sets into disjoint classes of finite, 
countably infinite, and uncountable sets. We also distinguish between finite and 
infinite sets, and countable and uncountable sets. We will show below, in Theo¬ 
rem 2.19, that the set of real numbers is uncountable, and we refer to its cardinality 
as the cardinality of the continuum. 

Definition 1.42. A set X has the cardinality of the continuum if X ss R. 

One has to be careful in extrapolating properties of finite sets to infinite sets. 
Example 1.43. The set of squares 

S — {1,4,9,16,..., n 2 ,...} 

is countably infinite since / : N —> S defined by f(n) = n 2 is one-to-one and onto. 
It may appear surprising at first that the set N can be in one-to-one correspondence 
with an apparently “smaller” proper subset S, since this doesn’t happen for finite 
sets. In fact, assuming the axiom of choice, one can show that a set is infinite if 
and only if it has the same cardinality as a proper subset. Dedekind (1888) used 
this property to give a definition infinite sets that did not depend on the natural 
numbers N. 

Next, we prove some results about countable sets. The following proposition 
states a useful necessary and sufficient condition for a set to be countable. 

Proposition 1.44. A non-empty set X is countable if and only if there is an onto 
map / : N —> X. 

Proof. If X is countably infinite, then there is a one-to-one, onto map / : N —> X. 
If X is finite and non-empty, then for some n £ N there is a one-to-one, onto map 
g : {1,2,... } n} —> X. Choose any x £ X and define the onto map / : N —> X by 

J-9( fc ) if k = l,2 ,,..,n, 

/( fc )=S -r u ,1 

la: if k = n + 1, n + 2,.... 
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Conversely, suppose that such an onto map exists. We define a one-to-one, onto 
map g recursively by omitting repeated values of /. Explicitly, let g(l) = /(1). 
Suppose that n > 1 and we have chosen n distinct g -values g( 1), g( 2),..., g(n). Let 

A n = {k £ N : f(k) ^ g(j) for every j = 1 , 2 ,..., n} 

denote the set of natural numbers whose /-values are not already included among 
the g-values. If A n = 0, then g : {1, 2,..., n} —> X is one-to-one and onto, and X 
is finite. Otherwise, let k n = minA n , and define g(n- 1-1) = f(k n ), which is distinct 
from all of the previous (/-values. Either this process terminates, and X is finite, or 
we go through all the /-values and obtain a one-to-one, onto map g : N —> X, and 
X is countably infinite. □ 

If X is a countable set, then we refer to an onto function / : N —> X as an 
enumeration of X, and write X = {x n : n € N}, where x n = f(ri). 

Proposition 1.45. The Cartesian product N x N is countably infinite. 


Proof. Define a linear order -< on ordered pairs of natural numbers as follows: 

(to, n) -< (in ', n') if either m + n < m' + n' or m + n = m' + n' and n < n'. 
That is, we arrange N x N in a table 


(1.1) (1,2) (1,3) (1,4) 

(2.1) (2,2) (2,3) (2,4) 

(3.1) (3,2) (3,3) (3,4) 

(4.1) (4,2) (4,3) (4,4) 


and list it along successive diagonals from bottom-left to top-right as 

(1,1), (2,1), (1,2), (3,1), (2,2), (1,3), (4,1), (3,2), (2,3), (1,4), .... 

We define / : N —> N x N by setting f(n ) equal to the nth pair in this order; 
for example, /(7) = (4,1). Then / is one-to-one and onto, so N x N is countably 
infinite. □ 

Theorem 1.46. A countable union of countable sets is countable. 

Proof. Let {X n : n € N} be a countable collection of countable sets. From Propo¬ 
sition 1.44, there is an onto map /„ : N — > X n . We define 

g : N x N —* |^J X n 

ng N 

by g(n,k) = f n (k). Then g is also onto. From Proposition 1.45, there is a one-to- 
one, onto map h : N — > N x N, and it follows that 

go h : N ->■ (J X n 

ng N 

is onto, so Proposition 1.44 implies that the union of the X n is countable. □ 
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The next theorem gives a fundamental example of an uncountable set, namely 
the set of all subsets of natural numbers. The proof uses a “diagonal” argument due 
to Cantor (1891), which is of frequent use in analysis. Recall from Definition 1.1 
that the power set of a set is the collection of all its subsets. 

Theorem 1.47. The power set 'P(N) of N is uncountable. 

Proof. Let C C 'P(N) be a countable collection of subsets of N 

C = {A n c N : n £ N}. 

Define a subset AcNby 

A = {neN:n^ A n } . 

Then A ^ A n for every n £ N since either n £ A and n ^ A n or n ^ A and n £ A n . 
Thus, A ^ C. It follows that no countable collection of subsets of N includes all of 
the subsets of N, so V(N) is uncountable. □ 

This theorem has an immediate corollary for the set £ of binary sequences 
defined in Definition 1.29. 

Corollary 1.48. The set £ of binary sequences has the same cardinality as P(N) 
and is uncountable. 

Proof. By Example 1.30, the set £ is in one-to-one correspondence with P(N), 
which is uncountable. □ 

It is instructive to write the diagonal argument in terms of binary sequences. 
Suppose that S = {s„ £ £ : n G N} is a countable set of binary sequences that 
begins, for example, as follows 

si =001101 ... 
s 2 = 110010 ... 
s 3 = 11 0 11 0 ... 
s 4 = 011000 ... 

S.5 = 1 0 0 1 1 1 ... 

s 6 = 100100 ... 


Then we get a sequence s ^ S' by going down the diagonal and switching the values 
from 0 to 1 or from 1 to 0. For the previous sequences, this gives 

s = 101101 .... 

We will show in Theorem 5.67 below that £ and P(N) are also in one-to-one cor¬ 
respondence with M, so both have the cardinality of the continuum. 

A similar diagonal argument to the one used in Theorem 1.47 shows that for 
every set X the cardinality of the power set V{X) is strictly greater than the 
cardinality of X. In particular, the cardinality of V(V(N)) is strictly greater than 
the cardinality of P(N), the cardinality of V(V(V(N))) is strictly greater than 
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the cardinality of V(V(N), and so on. Thus, there are many other uncountable 
cardinalities apart from the cardinality of the continuum. 

Cantor (1878) raised the question of whether or not there are any sets whose 
cardinality lies strictly between that of N and 'P(N). The statement that there 
are no such sets is called the continuum hypothesis, which may be formulated as 
follows. 

Hypothesis 1.49 (Continuum). If C C 'P(N) is infinite, then either C « N or 
C w V(N). 

The work of Godel (1940) and Cohen (1963) established the remarkable result 
that the continuum hypothesis cannot be proved or disproved from the standard 
axioms of set theory (assuming, as we believe to be the case, that these axioms 
are consistent). This result illustrates a fundamental and unavoidable incomplete¬ 
ness in the ability of any finite system of axioms to capture the properties of any 
mathematical structure that is rich enough to include the natural numbers. 
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Chapter six: 
Numbers and functions 


« SU ^ e ? t ^ course "functions of one real variable” so we begin by wondering what a real number 

really' is, and then, in the next section, what a function is. 


1. What is a number? 

1.1. Different kinds of numbers. The simplest numbers are the positive integers 

1 , 2 , 3 , 4 , * • ♦ 

the number zero 

0 ; 

and the negative integers 

••• ,-4,-3, -2.-1. 

Together these form the integers or “whole numbers.” 

Next, there are the numbers you got by dividing one whole number by another (nonzero) whole number 
These are the so called fractions or rational numbers such as 

1 1 2 1 2 3 4 

2’ 3’ 3’ 4’ 4 5 4’ 3’ 
or 

112 12 3 4 

2’ 3’ 3’ 4’ 4’ 4 ’ ~3’ 

By definition, any whole number is a rational number (in particular zero is a rational number.) 

You can add, subtract, multiply and divide any pair of rational numbers and the result will again la- a 
rational number (provided you don’t try to divide by zero). 

One day in middle school you were told that there are other numbers besides the rational numbers, and 
the first example of such a number is the square loot of two. It. has btru known over since the time ol the 
greeks that no rational number exists whose square is exactly 2, i.e. you can’t find a fraction ~ such that 

(—) = 2, i.e. m 2 = 2n 2 . 

. n 

Nevertheless, if you compute x 2 for some values of x between 1 and 2, and check if you 
get more or less than 2, then it looks like there should be some number x between 1.4 and 
1 .. r » whose square is exactly 2. So. we assume that there is such a number, and we call it 
the square root of 2. written as \/2. This raises several questions. How do we know there 
really is a number between 1.4 and 1.5 for which x 2 = 2? How many other such numbers 
are we going to assume into existence? Do these new numbers obey the same algebra rules 
(like n + 6 = 6 +a) as the rational numbers? If we knew precisely what these numbers (like 
y/2) were then we could perhaps answer such questions. It turns out to be rather difficult to give a precise 
description of what a number is. and in this course we won t try to get anywhere near the bottom of this 
issue. Instead we will think of numbers as ‘ infinite decimal expansions*’ as follows. 

One can represent certain fractions as decimal fractions, e g. 

279 1116 

25* 100 
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Not all fractions can be represented as decimal fractions. For instance, expanding | into a decimal fraction 
leads to an unending decimal fraction 

1 = 0.333 333 333 333 333 ••• 

3 

It is impossible to write the complete decimal expansion of | because it contains infinitely many digits. 
But we can describe the expansion: each digit is a three. An electronic calculator, which always represents 
numbers as finite decimal numbers, can never hold the number | exactly. 

Every fraction can be written as a decimal fraction which may or may not be finite. If the decimal 
expansion doesn’t end, then it must repeat. For instance, 

- = 0.142857142857142857142857 ... 

7 

Conversely, any infinite repeating decimal expansion represents a rational number. 

A real number is specified by a possibly unending decimal expansion. For instance, 

y/2 = 1.414 213 562 373 095 048 801688 724 209 698 078 569 671875 376 9... 

Of course you can never write all the digits in the decimal expansion, so you only write the first few digits 
and hide the others behind dots. To give a precise description of a real number (such as \/2) you have to 
explain how you could in principle compute as many digits in the expansion as you would like. During the 
next three semesters of calculus we will not go into the details of how this should be done. 

1.2. A reason to believe in \[2. The Pythagorean theorem says that the hy¬ 
potenuse of a right triangle with sides 1 and 1 must be a line segment of length \[2. In 
middle or high school you learned something similar to the following geometric construction 
of a line segment whose length is \[2. Take a square with side of length 1, and construct 
a new square one of whose sides is the diagonal of the first square. The figure you get 
consists of 5 triangles of equal area and by counting triangles you see that the larger 
square has exactly twice the area of the smaller square. Therefore the diagonal of the smaller square, being 
the side of the larger square, is \/2 as long as the side of the smaller square. 

Why are real numbers called real? All the numbers we will use in this first semester of calculus are 
“real numbers.” At some point (in 2nd semester calculus) it becomes useful to assume that there is a number 
whose square is —1. No real number has this property since the square of any real number is positive, so 
it was decided to call this new imagined number “imaginary” and to refer to the numbers we already have 
(rationals, -\/2-like things) as “real.” 

1.3. The real number line and intervals. It is customary to visualize the real numbers as points 
on a straight line. We imagine a line, and choose one point on this line, which we call the origin. We also 
decide which direction we call “left” and hence which we call “right.” Some draw the number line vertically 
and use the words “up” and “down.” 

To plot any real number x one marks off a distance x from the origin, to the right (up) if x > 0, to the 
left (down) if x < 0. 

The distance along the number line between two numbers x and y is \x — y\. In particular, the 
distance is never a negative number. 



-I-1- » ' 3 - h 

- 3 - 2-10 1 2 3 


Figure 1 . To draw the half open interval [—1,2) use a filled dot to mark the endpoint which is included 
and an open dot for an excluded endpoint. 
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H-1- 

-2 -1 



Figure 2. To find \/2 on the real line you draw a square of sides 1 and drop the diagonal onto the real line. 


Almost every equation involving variables x, y , etc. we write down in this course will be true for some 
values of x but not for others. In modern abstract mathematics a collection of real numbers (or any other 
kind of mathematical objects) is called a set. Below are some examples of sets of real numbers. We will use 
the notation from these examples throughout this course. 

The collection of all real numbers between two given real numbers form an interval. The following 
notation is used 

• (a, b ) is the set of all real numbers x which satisfy a < x < b. 

• [a, b) is the set of all real numbers x which satisfy a < x < b. 

• (a, b} is the set of all real numbers x which satisfy a < x < b. 

• [a, b} is the set of all real numbers x which satisfy a < x < b. 

If the endpoint is not included then it may be oo or —oo. E.g. (—oo, 2] is the interval of all real numbers 
(both positive and negative) which are < 2. 

1.4. Set notation. A common way of describing a set is to say it is the collection of all real numbers 
which satisfy a certain condition. One uses this notation 

A= {x \x satisfies this or that condition} 

Most of the time we will use upper case letters in a calligraphic font to denote sets. ( A,B,C,T >, ...) 

For instance, the interval (a, b) can be described as 

(a, b) = {x | a < x < 6} 

The set 

B = {x | a; 2 - 1 > 0} 

consists of all real numbers x for which x 2 — 1 > 0, i.e. it consists of all real numbers x for which either x > 1 
or x < —1 holds. This set consists of two parts: the interval (—oo, —1) and the interval (l,oo). 

You can try to draw a set of real numbers by drawing the number line and coloring the points belonging 
to that set red, or by marking them in some other way. 

Some sets can be very difficult to draw. For instance, 

C = {x ] x is a rational number} 

can’t be accurately drawn. In this course we will try to avoid such sets. 

Sets can also contain just a few numbers, like 

V = {1,2,3} 

which is the set containing the numbers one, two and three. Or the set 

£ = {a’ | x 3 — 4x 2 + 1 = 0} 

which consists of the solutions of the equation x 3 — 4x 2 + 1 = 0. (There are three of them, but it is not easy 
to give a formula for the solutions.) 

If A and B are two sets then the union of A and B is the set which contains all numbers that belong 
either to A or to B. The following notation is used 

A U B = {a; | x belongs to A or to B or both.} 
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Similarly, the intersection of two sets A and B is the set of numbers which belong to both sets. This 
notation is used: 

A fl B = {x | x belongs to both A and B .} 


2. Exercises 


1. What is the 2007 th digit after the period in the expan¬ 
sion of i? 

2. Which of the following fractions have finite decimal 
expansions? 

_ 2 3 _ 276937 

a ~ 3 ’ 25 ’ C “ 15625 ' 

3. Draw the following sets of real numbers. Each of these 
sets is the union of one or more intervals. Find those 
intervals. Which of thee sets are finite? 

A = {* | x 2 — 3* + 2 < 0} 

S = {*|* 2 -3* + 2>0} 

C = {* | x 2 — 3x > 3} 

V = {* | x 2 — 5 > 2*} 

£ = {t | t 2 -3t + 2 < 0} 

T = {a | q 2 — 3a + 2 > 0} 

S = (0,1)U(5,7] 

n = ({1} U {2,3}) n (0,2^2) 

Q= {e 181110 = |} 

1Z = {<p | cos > 0} 


4. Suppose A and B are intervals. Is it always true that 
A (~l B is an interval? How about A U B1 

5. Consider the sets 

M. = {* | x > 0} and Af = {y \ y > 0}. 

Are these sets the same? 

6. Group Problem. 

Write the numbers 

x = 0.3131313131..., y = 0.273273273273 ... 
and 2 = 0.21541541541541541... 
as fractions (i.e. write them as specifying m and n.) 

(Hint: show that 100* = * + 31. A similar trick 
works for y, but 2 is a little harder.) 

7. Group Problem. 

Is the number whose decimal expansion after the 
period consists only of nines, i.e. 

* = 0.99999999999999999 ... 

an integer? 


3. Functions 


Wherein we meet the main characters of this semester 


3.1. Definition. To specify a function f you must 

(1) give a rule which tells you how to compute the value /(*) of the function for a given real number 
x, and: 

(2) say for which real numbers x the rule may be applied. 

The set of numbers for which a function is defined is called its domain. The set of all possible numbers f(x) 
as x runs over the domain is called the range of the function. The rule must be unambiguous: the same 
*must always lead to the same f(x). 

For instance, one can define a function / by putting f(x) = \fx for all x > 0. Here the rule defining / is 
“take the square root of whatever number you’re given”, and the function / will accept all nonnegative real 
numbers. 


The rule which specifies a function can come in many different forms. Most often it is a formula, as in 
the square root example of the previous paragraph. Sometimes you need a few formulas, as in 


9{x) 


2x for * < 0 
x 2 for * > 0 


domain of g = all real numbers. 


Functions which are defined by different formulas on different intervals are sometimes called piecewise 
defined functions. 


3.2. Graphing a function. You get the graph of a function f by drawing all points whose coordi¬ 
nates are (x, y) where x must be in the domain of / and y = f{x). 
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Figure 3. The graph of a function /. The domain of / consists of all x values at which the function is 
defined, and the range consists of all possible values / can have. 



Figure 4. A straight line and its slope. The line is the graph of f(x) = mx + n. It intersects the y-axis 
at height n, and the ratio between the amounts by which y and x increase as you move from one point 
to another on the line is yl ~ yo — m. 

X 1 ~XQ 


3.3. Linear functions. A function which is given by the formula 

f(x) = mx + n 

where m and n are constants is called a linear function. Its graph is a straight line. The constants m 
and n are the slope and y-intercept of the line. Conversely, any straight line which is not vertical (i.e. not 
parallel to the y-axis) is the graph of a linear function. If you know two points (xo,yo) and (xi,yi) on the 
line, then then one can compute the slope m from the “rise-over-run” formula 

Vi - Vo 

m = -. 

X\ - x 0 

This formula actually contains a theorem from Euclidean geometry, namely it says that the ratio ( y\ — yf) : 
(xi — Xo) is the same for every pair of points (xo,2/o) an d (xi,y i) that you could pick on the line. 

3.4. Domain and “biggest possible domain. ” In this course we will usually not be careful about 
specifying the domain of the function. When this happens the domain is understood to be the set of all x 
for which the rule which tells you how to compute /( x) is meaningful. For instance, if we say that h is the 
function 

h(x) = \fx 
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Figure 5. The graph of y = x 3 — x fails the “horizontal line test," but it passes the “vertical line test." 

The circle fails both tests. 

then the domain of h is understood to be the set of all nonnegative real numbers 

domain of h = [0, oo) 

since yjx is well-defined for all x > 0 and undefined for x < 0. 

A systematic way of finding the domain and range of a function for which you are only given a formula is 
as follows: 

• The domain of / consists of all x for which f(x) is well-defined (“makes sense”) 

• The range of / consists of all y for which you can solve the equation f(x) = y. 

3.5. Example — find the domain and range of f{x) = 1/a: 2 . The expression l/x 2 can be computed 
for all real numbers x except x = 0 since this leads to division by zero. Hence the domain of the function 
f(x) = l/x 2 is 

“all real numbers except 0” = {a: | x ^ 0} = (—oo, 0) U (0, oo). 

To find the range we ask “for which y can we solve the equation y = f(x) for x ,” i.e. we for which y can you 
solve y — l/x 2 for xl 

If y = l/x 2 then we must have a: 2 = 1 /y, so first of all, since we have to divide by y , y can’t be zero. 
Furthermore, 1/y = x 2 says that y must be positive. On the other hand, if y > 0 then y = l/x 2 has a solution 
(in fact two solutions), namely x = ±1 /\Jy. This shows that the range of / is 

“all positive real numbers” = {a: | x > 0} = (0,oo). 

3.6. Functions in “real life. ” One can describe the motion of an object using a function. If some 
object is moving along a straight line, then you can define the following function: Let x(t) be the distance 
from the object to a fixed marker on the line, at the time t. Here the domain of the function is the set of all 
times t for which we know the position of the object, and the rule is 

Given t, measure the distance between the object and the marker at time t,. 

There are many examples of this kind. For instance, a biologist could describe the growth of a cell by 
defining m(t) to be the mass of the cell at time t (measured since the birth of the cell). Here the domain is 
the interval [0,T], where T is the life time of the cell, and the rule that describes the function is 

Given t, weigh the cell at time t. 

3.7. The Vertical Line Property. Generally speaking graphs of functions are curves in the plane but 
they distinguish themselves from arbitrary curves by the way they intersect vertical lines: The graph of 
a function cannot intersect a vertical line “x = constant ” in more than one point. The reason 
why this is true is very simple: if two points lie on a vertical line, then they have the same x coordinate, so if 
they also lie on the graph of a function /, then their y-coordinates must also be equal, namely f{x). 
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3.8. Examples. The graph of f(x) = x 3 — x “goes up and down,” and, even though it intersects several 
horizontal lines in more than one point, it intersects every vertical line in exactly one point. 

The collection of points determined by the equation x 2 + y 2 = 1 is a circle. It is not the graph of a 
function since the vertical line x = 0 (the y-axis) intersects the graph in two points -Pi(0,1) and -P2(0, — 1). 
See Figure 6. 


4. Inverse functions and Implicit functions 

For many functions the rule which tells you how to compute it is not an explicit formula, but instead an 
equation which you still must solve. A function which is defined in this way is called an “implicit function.” 

4.1. Example. One can define a function / by saying that for each x the value of f(x) is the solution y 
of the equation 

x 2 + 2y — 3 = 0. 

In this example you can solve the equation for y , 

3 — x 2 
V= —■ 

Thus we see that the function we have defined is f(x) = (3 — x 2 )/2. 

Here we have two definitions of the same function, namely 

(i) u y = f( x ) is defined by x 2 + 2y — 3 = 0,” and 

(ii) “/ is defined by f(x) = (3 — cc 2 )/2.” 

The first definition is the implicit definition, the second is explicit. You see that with an “implicit function” 
it isn’t the function itself, but rather the way it was defined that’s implicit. 


4.2. Another example: domain of an implicitly defined function. Define g by saying that for 
any x the value y = g{x) is the solution of 

x 2 + xy — 3 = 0. 

Just as in the previous example one can then solve for y, and one finds that 


Unlike the previous example this formula does not make sense when x = 0, and indeed, for x = 0 our rule for 
g says that g(0) = y is the solution of 

0 2 + 0 • y — 3 = 0, i.e. y is the solution of 3 = 0. 

That equation has no solution and hence x = 0 does not belong to the domain of our function g. 




Figure 6. The circle determined by x 2 +y 2 = 1 is not the graph of a function, but it contains the graphs 
of the two functions hi(x) = y/1 — x 2 and h, 2 (x) = —\J 1 — x 2 . 
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4.3. Example: the equation alone does not determine the function. Define y = h(x) to be the 
solution of 

x 2 +y 2 = 1. 

If x > 1 or x < — 1 then x 2 > 1 and there is no solution, so h{x) is at most defined when — 1 < x < 1. But 
when — 1 < x < 1 there is another problem: not only does the equation have a solution, but it even has two 
solutions: 

x 2 + y 2 = 1 y = \/l — x 2 or y = —\/l — x 2 . 

The rule which defines a function must be unambiguous, and since we have not specified which of these two 
solutions is h(x) the function is not defined for — 1 < x < 1. 

One can fix this by making a choice, but there are many possible choices. Here are three possibilities: 


hi (x) = the nonnegative solution y of x 2 + y 2 = 1 

h 2 {x) = the nonpositive solution y of x 2 + y 2 = 1 

, , . I hi(x) when x < 0 
hz(x) = < 

[ /12 (x) when x > 0 


4.4. Why use implicit functions? In all the examples we have done so far we could replace the 
implicit description of the function with an explicit formula. This is not always possible or if it is possible the 
implicit description is much simpler than the explicit formula. For instance, you can define a function / by 
saying that y = f(x) if and only if 

(1) y 3 + 3y + 2x = 0. 

This means that the recipe for computing /( x) for any given x is “solve the equation y 3 + 3y + 2x = 0.” 
E.g. to compute /(0) you set x = 0 and solve y 3 + 3y = 0. The only solution is y = 0, so /(0) = 0. To 
compute /(1) you have to solve y 3 + 3y + 2 ■ 1 = 0, and if you’re lucky you see that y = — 1 is the solution, 
and /(1) = -1. 

In general, no matter what x is, the equation (1) turns out to have exactly one solution y (which depends 
on x , this is how you get the function /). Solving (1) is not easy. In the early 1500s Cardano and Tartaglia 
discovered a formula 1 for the solution. Here it is: 

y = f(x) = \j— x + \/\ + x 2 — x + \A + x 1 . 

The implicit description looks a lot simpler, and when we try to differentiate this function later on, it will be 
much easier to use “implicit differentiation” than to use the Cardano-Tartaglia formula directly. 


4.5. Inverse functions. If you have a function /, then you can try to define a new function / 1 , the 
so-called inverse function of f, by the following prescription: 

(2) For any given x we say that y = f~ x (x) if y is the solution to the equation f(y) = x. 

So to find y = / _1 ( x) you solve the equation x = f{y). If this is to define a function then the prescription 
(2) must be unambiguous and the equation f(y) = x has to have a solution and cannot have more than one 
solution. 


1 To see the solution and its history visit 
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The graph of f 



Figure 7. The graph of a function and its inverse are mirror images of each other. 


4.6. Examples. Consider the function / with f(x) = 2x + 3. Then the equation f(y) = x works out to 
be 

2y + 3 = x 

and this has the solution 

x — 3 


So / x {x) is defined for all x , and it is given by / x (a:) = (x — 3)/2. 

Next we consider the function g(x ) = x 2 with domain all positive real numbers. To see for which x the 
inverse g~ 1 (x) is defined we try to solve the equation g(y) = x, i.e. we try to solve y 2 = x. If x < 0 then this 
equation has no solutions since y -0 for all y. But if x > 0 then y~x does have a solution, namely y = y/x. 

So we see that g _1 (x) is defined for all nonnegative real numbers x , and that it is given by g _1 (x) = yfx. 


4.7. Inverse trigonometric functions. The familiar trigonometric functions Sine, Cosine and 


have inverses which are 

called arcsine, arccosine and arctangent. 


y = f{x) 

y = sin x 

(—7t/2 < X < 7t/2) 

x = f 1 (y) 

x = arcsin (y) 

(-!<»< 1) 

y = cos x 

(0 < X < 7t) 

x = arccos(y) 

(-1<»<1) 

y = tan x 

(—7t/2 < X < 7t/2) 

x = arctan (y) 



Tangent 


The notations arcsin y = sin _1 y, arccos® = cos -1 ®, and arc tan u = tan -1 u are also commonly used for 
the inverse trigonometric functions. We will avoid the sin -1 y notation because it is ambiguous. Namely, 
everybody writes the square of sin y as 

(sin y) = sin 2 y. 

Replacing the 2’s by — l’s would lead to 

—i 717 t \— i 1 

arcsin y = sin y = (siny) = ——, which is not true! 

sin y 


5. Exercises 


8. The functions / and g are defined by 

f(x) = x 2 and g(s) = s 2 . 

Are / and g the same functions or are they different? 

9. Find a formula for the function / which is defined by 

V = f(x) x 2 y + y = 7. 

What is the domain of /? 


10. Find a formula for the function / which is defined by 

y = f(x) <S=> x 2 y-y = 6. 

What is the domain of /? 

11. Let / be the function defined by y = f(x) <*==> y is 
the largest solution of 

y 2 = 3x 2 — 2xy. 
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Find a formula for /. What are the domain and range of 

/? 

12. Find a formula for the function / which is defined by 

y = /(x) <=3- 2x + 2 xy + y 2 = 5 and y > — x. 

Find the domain of /. 

13. Use a calculator to compute /(1.2) in three deci¬ 
mals where / is the implicitly defined function from §4.4. 
(There are (at least) two different ways of finding /(1.2)) 


14. Group Problem. 

(a) True or false: 

for all x one has sin(arcsinx) = x ? 

(b) True or false: 

for all x one has arcsin(sinx) = x? 


15. On a graphing calculator plot the graphs of the follow¬ 
ing functions, and explain the results. (Hint: first do the 
previous exercise.) 


f(x) = arcsin (sin*), —27r < x < 2t r 

g(x) = arcsin(x) + arccos(x), 0 < x < 1 
sin x 


h(x) = arctan - 

cos* 

. . cos * 

k(x) = arctan-, 

sin * 

l(x) = arcsin(cos *), 
m(x) = cos(arcsinx), 


|*| < 7t/2 
|*| < 7t/2 
—TV < X < 7T 
-1 < * < 1 


16. Find the inverse of the function / which is given by 
/(*) = sin* and whose domain is 7r < * < 27r. Sketch 
the graphs of both / and / -1 . 

17. Find a number a such that the function /(*) = 
sin(x + 7 t/ 4) with domain a < x < a + n has an inverse. 
Give a formula for / _1 (x) using the arcsine function. 

18. Draw the graph of the function fi 3 from §4.3. 

19. A function / is given which satisfies 

/(2* + 3) = x 2 
for all real numbers *. 

Compute 

(a) /(0) (b) /(3) (c) /(*) 

(d) f(y) (e) /(/(2)) 

where x and y are arbitrary real numbers. 

What are the range and domain of /? 

20. A function / is given which satisfies 


for all real numbers x. 

Compute 

(a) /W (b) /(0) (c) /(*) 

(d) /(f) (e) /(/( 2)) 

where * and t are arbitrary real numbers. 

What are the range and domain of /? 

21. Does there exist a function / which satisfies 
f(x 2 ) = x + 1 
for all real numbers *? 


* * * 

The following exercises review precalculus material in¬ 
volving quadratic expressions ax 2 + bx + c in one way or 
another. 

22. Explain how you “complete the square” in a quadratic 
expression like ax 2 + bx. 

23. Find the range of the following functions: 

/(*) = 2* 2 + 3 
g(x ) = —2* 2 + 4* 
h[x) =4 x + x 2 
k(x) = 4 sin * + sin 2 * 
l(x) = 1/(1 + x 2 ) 
m(x) = 1/(3 + 2* + x 2 ). 

24. Group Problem. 

For each real number a we define a line l a with 
equation y = ax + a 2 . 

(a) Draw the lines corresponding to a = 

—2 —1 —1 n 1 1 2 

(b) Does the point with coordinates (3,2) lie on one 
or more of the lines £ a (where a can be any number, not 
just the five values from part (a))? If so, for which values 
of a does (3, 2) lie on £ a 7 

(c) Which points in the plane lie on at least one of 
the lines l a 7. 

25. For which values of m and n does the graph of 
/(*) = mx + n intersect the graph of <?(*) = 1/x in 
exactly one point and also contain the point (—1, 1)? 

26. For which values of m and n does the graph of 
/(*) = mx + n not intersect the graph of g(x) = 1/x? 
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Derivatives (1) 


To work with derivatives you have to know what a limit is, but to motivate why we are going to study 
limits let’s first look at the two classical problems that gave rise to the notion of a derivative: the tangent to 
a curve, and the instantaneous velocity of a moving object. 


1. The tangent to a curve 

Suppose you have a function y = f(x) and you draw its graph. If you want to find the tangent to the 
graph of / at some given point on the graph of /, how would you do that? 



Let P be the point on the graph at which want to draw the tangent. If you are making a real paper and 
ink drawing you would take a ruler, make sure it goes through P and then turn it until it doesn’t cross the 
graph anywhere else. 

If you are using equations to describe the curve and lines, then you could pick a point Q on the graph 
and construct the line through P and Q (“construct” means “find an equation for”). This line is called a 
“secant,” and it is of course not the tangent that you’re looking for. But if you choose Q to be very close to P 
then the secant will be close to the tangent. 
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So this is our recipe for constructing the tangent through P: pick another point Q on the graph, find the 
line through P and Q, and see what happens to this line as you take Q closer and closer to P. The resulting 
secants will then get closer and closer to some line, and that line is the tangent. 

We’ll write this in formulas in a moment, but first let’s worry about how close Q should be to P. We 
can’t set Q equal to P, because then P and Q don’t determine a line (you need two points to determine a 
line). If you choose Q different from P then you don’t get the tangent, but at best something that is “close” 
to it. Some people have suggested that one should take Q “infinitely close” to P, but it isn’t clear what that 
would mean. The concept of a limit is meant to solve this confusing problem. 


2. An example — tangent to a parabola 


To make things more concrete, suppose that the function we had was /(x) = x 2 , and that the point was 
(1,1). The graph of / is of course a parabola. 

Any line through the point P(l, 1) has equation 

y — 1 = m( x — 1) 

where m is the slope of the line. So instead of finding the equation of the secant and tangent lines we will 
find their slopes. 


Let Q be the other point on the parabola, with coordinates (x,x 2 ). We can 
“move Q around on the graph” by changing x. Whatever x we choose, it must be 
different from 1, for otherwise P and Q would be the same point. What we want to 
find out is how the line through P and Q changes if x is changed (and in particular, if 
x is chosen very close to a). Now, as one changes x one thing stays the same, namely, 
the secant still goes through P. So to describe the secant we only need to know its 
slope. By the “rise over run” formula, the slope of the secant line joining P and Q is 


/\ y ^ 

rapQ = where Ay = x 2 — 1 and Ax = x — 1. 



By factoring x 2 — 1 we can rewrite the formula for the slope as follows 


( 3 ) 


mpQ = 


Ay 


x 2 — 1 
x — 1 


(x - l)(x + 1) 


1 . 


Ax x — 1 x — 1 

As x gets closer to 1, the slope mpq, being x + 1, gets closer to the value 1 + 1 = 2. We say that 


the limit of the slope mpQ as Q approaches P is 2. 


In symbols, 

lim topq =2 , 
Q->.p v 

or, since Q approaching P is the same as x approaching 1, 


( 4 ) 


lim mpQ = 2. 


So we find that the tangent line to the parabola y = x 2 at the point (1,1) has equation 

y — 1 = 2(x — 1), i.e. y = 2x — 1. 


A warning: you cannot substitute x = 1 in equation (3) to get (4) even though it looks like that’s what we 
did. The reason why you can’t do that is that when x = 1 the point Q coincides with the point P so “the 
line through P and Q ” is not defined; also, if x = 1 then Ax = Ay = 0 so that the rise-over-run formula for 
the slope gives 


mpo = —— = - = undefined. 

Q Ay 0 

It is only after the algebra trick in (3) that setting x = 1 gives something that is well defined. But if the 
intermediate steps leading to mpQ = x + 1 aren’t valid for x = 1 why should the final result mean anything 


for x = 1? 
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Something more complicated has happened. We did a calculation which is valid for all x 7 ^ 1, and later 
looked at what happens if x gets “very close to 1.” This is the concept of a limit and we’ll study it in more 
detail later in this section, but first another example. 


3. Instantaneous velocity 


If you try to define “instantaneous velocity” you will again end up trying to divide zero by zero. Here is 
how it goes: When you are driving in your car the speedometer tells you how fast your are going, i.e. what 
your velocity is. What is this velocity? What does it mean if the speedometer says “50mph”? 


s = 0 


Time = t 




Time = t + At 




s(t) 


As = s(t + At) — s(t) 


We all know what average velocity is. Namely, if it takes you two hours to cover 100 miles, then your 
average velocity was 


distance traveled 

- ; - ; ---= 50 miles per hour. 

time it took 

This is not the number the speedometer provides you - it doesn’t wait two hours, measure how far you went 
and compute distance/time. If the speedometer in your car tells you that you are driving 50mph, then that 
should be your velocity at the moment that you look at your speedometer, i.e. “distance traveled over time 
it took” at the moment you look at the speedometer. But during the moment you look at your speedometer 
no time goes by (because a moment has no length) and you didn’t cover any distance, so your velocity at that 
moment is jj, i.e. undefined. Your velocity at any moment is undefined. But then what is the speedometer 
telling you? 


To put all this into formulas we need to introduce some notation. Let t be the time (in hours) that has 
passed since we got onto the road, and let s(t) be the distance we have covered since then. 


Instead of trying to find the velocity exactly at time t, we find a formula for the average velocity during 
some (short) time interval beginning at time t. We’ll write At for the length of the time interval. 

At time t we have traveled s(t) miles. A little later, at time t + At we have traveled s(t + At). Therefore 
during the time interval from t to t + At we have moved s(t + At) — s(t) miles. Our average velocity in that 
time interval is therefore 

s(t + At) — s(t) 

--- miles per hour. 

At 

The shorter you make the time interval, i.e. the smaller you choose At, the closer this number should be to 
the instantaneous velocity at time t. 

So we have the following formula (definition, really) for the velocity at time f 


(5) 


v{t) 


lim S ^ + ^ _ S ^ 
At— >-0 At 


4. Rates of change 

The two previous examples have much in common. If we ignore all the details about geometry, graphs, 
highways and motion, the following happened in both examples: 

We had a function y = f{x), and we wanted to know how much /( x) changes if x changes. If you change 
x to x + Ax , then y will change from f(x) to f(x + Ax). The change in y is therefore 

Ay = f(x + Ax) — f(x), 

and the average rate of change is 

ffi) A y = f(x + As) - f(x) 

[ 1 Ax Ax 
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This is the average rate of change of / over the interval from x to x + Ax. To define the rate of change of 
the function f at x we let the length Ax of the interval become smaller and smaller, in the hope that the 
average rate of change over the shorter and shorter time intervals will get closer and closer to some number. 
If that happens then that “limiting number” is called the rate of change of / at x, or, the derivative of / at 
x. It is written as 


( 7 ) 


/'(» 


lim 

A:r—>-0 


/(x + Ax) - /(x) 
Ax 


Derivatives and what you can do with them are what the first half of this semester is about. The description 
we just went through shows that to understand what a derivative is you need to know what a limit is. In the 
next chapter we’ll study limits so that we get a less vague understanding of formulas like (7). 


5. Examples of rates of change 

5.1. Acceleration as the rate at which velocity changes. As you are driving in your car your 
velocity does not stay constant, it changes with time. Suppose v(t) is your velocity at time t (measured 
in miles per hour). You could try to figure out how fast your velocity is changing by measuring it at one 
moment in time (you get v(t)), then measuring it a little later (you get v(At))). You conclude that your 
velocity increased by Av = v(t + At) — v(t) during a time interval of length At, and hence 

{ average rate at which 
your velocity changed 

This rate of change is called your average acceleration (over the time interval from t to t + At). Your 
instantaneous acceleration at time t is the limit of your average acceleration as you make the time interval 
shorter and shorter: 

r i . v(t+ At) — v(t) 

I acceleration at time 11 = a = Inn ---. 

1 ' At—^o At 

th the average and instantaneous accelerations are measured in “miles per hour per hour,” i.e. in 

(mi/h)/h = mi/h 2 . 

Or, if you had measured distances in meters and time in seconds then velocities would be measured in meters 
per second, and acceleration in meters per second per second, which is the same as meters per second 2 , i.e. 
“meters per squared second.” 


'I Av v(t + At) — v(t) 

J A7 = At 


5.2. Reaction rates. Think of a chemical reaction in which two substances A and B react to form 
AB 2 according to the reaction 

A + 2B —» AB 2 . 

If the reaction is taking place in a closed reactor, then the “amounts” of A and B will be decreasing, while the 
amount of AB 2 will increase. Chemists write [A] for the amount of “A” in the chemical reactor (measured in 
moles). Clearly [A] changes with time so it defines a function. We’re mathematicians so we will write “[A](i)” 
for the number of moles of A present at time t. 

To describe how fast the amount of A is changing we consider the derivative of [A] with respect to time, 

[A] (£ + A t) — [A](t) 


[A]'(<) = lim 
At—>0 


At 


This quantity is the rate of change of [A]. The notation “[A]'(f)” is really only used by calculus professors. If 
you open a paper on chemistry you will find that the derivative is written in Leibniz notation: 

d\M 

dt 

More on this in §1.2 

How fast does the reaction take place ? If you add more A or more B to the reactor then you would expect 
that the reaction would go faster, i.e. that more AB 2 is being produced per second. The law of mass-action 
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kinetics from chemistry states this more precisely. For our particular reaction it would say that the rate at 
which A is consumed is given by 

in which the constant k is called the reaction constant. It’s a constant that you could try to measure by 
timing how fast the reaction goes. 


6. Exercises 


27. Repeat the reasoning in §2 to find the slope at the 
point (|, |), or more generally at any point (a, a 2 ) on 
the parabola with equation y = x 2 . 


31. Look ahead at Figure 3 in the next chapter. What is 
the derivative of f(x ) = a;cos ^ at the points A and B 
on the graph? 


28. Repeat the reasoning in §2 to find the slope at the 
point (|, |), or more generally at any point (a,a 3 ) on 
the curve with equation y = x 3 . 

29. Group Problem. 

Should you trust your calculator? 

Find the slope of the tangent to the parabola y = x 2 
at the point (|, |) (You have already done this: see 
exercise 27). 

Instead of doing the algebra you could try to compute 
the slope by using a calculator. This exercise is about 
how you do that and what happens if you try (too hard). 

Compute 2 s f° r various values of Ax\ 

Ax = 0.1, 0.01,0.001,10 -6 ,10 -12 . 

As you choose Ax smaller your computed ^ ought to 
get closer to the actual slope. Use at least 10 decimals 
and organize your results in a table like this: 


Ax 

0.1 

0.01 

0.001 

l 0 -6 

10 -12 


/(«) 


/(a + Aj) 


Ay Ay/Ax 


Look carefully at the ratios Ay/Ax. Do they look like 
they are converging to some number? Compare the values 
of ^ with the true value you got in the beginning of 
this problem. 


30. S implify the algebraic expressions you get when you 
compute Ay and Ay/Ax for the following functions 

(a) y = x 2 — 2x + 1 


(6) V 
(c) V 


1 


x 

2 X 


32. Suppose that some quantity y is a function of some 
other quantity x, and suppose that y is a mass, i.e. y 
is measured in pounds, and a; is a length, measured in 
feet. What units do the increments Ay and Aa:, and the 
derivative dy/dx have? 

33. A tank is filling with water. The volume (in gallons) 
of water in the tank at time t (seconds) is V(t). What 
units does the derivative V'(t) have? 

34. Group Problem. 

Let A(x) be the area of an equilateral triangle whose 
sides measure x inches. 

(a) Show that ^ has the units of a length. 

(b) Which length does represent geometrically? 
[Hint: draw two equilateral triangles, one with side x and 
another with side x + Ax. Arrange the triangles so that 
they both have the origin as their lower left hand corner, 
and so there base is on the x-axis.] 

35. Group Problem. 

Let A(x) be the area of a square with side x, and let 
L(x) be the perimeter of the square (sum of the lengths 
of all its sides). Using the familiar formulas for A{x) and 
L(x) show that A'(x) = \L(x). 

Give a geometric interpretation that explains why 
AA « \L{x) Ax for small Ax. 

36. Let A(r) be the area enclosed by a circle of radius 
r, and let L(r ) be the length of the circle. Show that 
A'(r) = L(r). (Use the familiar formulas from geometry 
for the area and perimeter of a circle.) 

37. Let V(r) be the volume enclosed by a sphere of ra¬ 
dius r, and let S(r) be the its surface area. Show that 
V'(r) = S(r). (Use the formulas V(r) = |7rr 3 and 
S(r) = 47rr 2 .) 
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Limits and Continuous Functions 


1. Informal definition of limits 


While it is easy to define precisely in a few words what a square root is (y/a is the positive number whose 
square is a) the definition of the limit of a function runs over several terse lines, and most people don’t find it 
very enlightening when they first see it. (See §2.) So we postpone this for a while and fine tune our intuition 
for another page. 


1.1. Definition of limit (1st attempt). If / is some function then 

lim /( x) = L 

x—>a 

is read “the limit of f(x) as x approaches a is L.” It means that if you choose values of x which are close but 
not equal to a, then f(x) will be close to the value L; moreover, f(x) gets closer and closer to L as x gets 
closer and closer to a. 

The following alternative notation is sometimes used 

f(x) -» L as x —> a; 

(read “/(x) approaches L as x approaches a” or u f(x) goes to L is x goes to a”.) 

1.2. Example. If f(x) = x + 3 then 

lim f(x) = 7, 

x —>-4 

is true, because if you substitute numbers x close to 4 in f(x) = x + 3 the result will be close to 7. 


1.3. Example: substituting numbers to guess a limit. What (if anything) is 


lim 

x —>2 


x 2 — 2x 
x 2 — 4 


Here /( x) = ( x 2 — 2x)/(x 2 — 4) and a = 2. 

We first try to substitute x = 2, but this leads to 

2 2 — 2 ■ 

/( 2 ) = 


2 2 -4 0 

which does not exist. Next we try to substitute values of x close but not equal to 2. Table 1 suggests that 
f(x) approaches 0.5. 


X 

/ 0) 

X 

9{x) 

3.000000 

0.600000 

1.000000 

1.009990 

2.500000 

0.555556 

0.500000 

1.009980 

2.100000 

0.512195 

0.100000 

1.009899 

2.010000 

0.501247 

0.010000 

1.008991 

2.001000 

0.500125 

0.001000 

1.000000 


Table 1. Finding limits by substituting values of x "close to a." (Values of f(x) and g(x ) rounded to 
six decimals.) 
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1.4. Example: Substituting numbers can suggest the wrong answer. The previous example 
shows that our first definition of “limit” is not very precise, because it says “x close to a,” but how close is 
close enough? Suppose we had taken the function 

. . IOIOOOx 

q (x) = - 

' 100 000x + l 

and we had asked for the limit lirn^o g{x). 

Then substitution of some “small values of x” could lead us to believe that the limit is 1.000 .... Only 
when you substitute even smaller values do you find that the limit is 0 (zero)! 

See also problem 29. 


2. The formal, authoritative, definition of limit 

The informal description of the limit uses phrases like “closer and closer” and “really very small.” In 
the end we don’t really know what they mean, although they are suggestive. “Fortunately” there is a good 
definition, i.e. one which is unambiguous and can be used to settle any dispute about the question of whether 
littL^a /(x) equals some number L or not. Here is the definition. It takes a while to digest, so read it once, 
look at the examples, do a few exercises, read the definition again. Go on to the next sections. Throughout 
the semester come back to this section and read it again. 


2.1. Definition of lim x _> a /(x) = L. We say that L is the limit of /(x) as x a, if 

(1) /(x) need not be defined at x = a, but it must be defined for all other x in some interval which 
contains a. 

(2) for every e > 0 one can find a 6 > 0 such that for all x in the domain of f one has 

(8) |x — a\ <5 implies |/(x) — L\ < e. 


Why the absolute values? The quantity |x — y\ is the distance between the points x and y on the 
number line, and one can measure how close x is to y by calculating |x — y\. The inequality |x — y\ <6 says 
that “the distance between x and y is less than S,” or that “x and y are closer than 6.” 

What are e and <5? The quantity e is how close you would like /(x) to be to its limit L; the quantity 5 
is how close you have to choose x to a to achieve this. To prove that /(x) = L you must assume that 

someone has given you an unknown e > 0, and then find a postive 5 for which (8) holds. The 6 you find will 
depend on e. 


2.2. Show that lim x _>5 2x + 1 = 11 . We have /(x) = 2x + 1, a = 5 and L = 11, and the question we 
must answer is “how close should x be to 5 if want to be sure that /(x) = 2x + 1 differs less than e from 
L = 11?” 

To figure this out we try to get an idea of how big |/(x) — L\ is: 

|/(x) — L\ = |(2x + 1) - ll| = |2x — 10| = 2- |x- 5| = 2- |x- o|. 

So, if 2|x — a\ < £ then we have |/(x) — L\ < e, i.e. 

if |x — a| < \e then |/(x) — L\ < e. 

We can therefore choose S = ^e. No matter what e > 0 we are given our S will also be positive, and if 
|x — 5| <6 then we can guarantee |(2x + 1) — 11| < e. That shows that lirn^s 2x + 1 = 11. 
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A 


y = f(x) 



2.3. The limit \im x _>ix 2 = 1 and the “don’t choose 5 > 1” trick. We have f(x) = x 2 , a = 1, 
L = 1, and again the question is, “how small should \x — 1| be to guarantee \x 2 — 1| < e?” 

We begin by estimating the difference \x 2 — 1| 

\x 2 - 1| = |(a: - l)(x + 1)| = \x + 1| • \x - 1|. 
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Propagation of errors - another interpretation of e and S 

According to the limit definition 7ra; 2 = A" is true if for every £ > 0 you can find a 5 > 0 such that 

\x — R\ <5 implies \ttx 2 — A\ < e. Here's a more concrete situation in which e and 5 appear in exactly the same 
roles: 


Suppose you are given a circle drawn on a piece of 
paper, and you want to know its area. You decide to 
measure its radius, R, and then compute the area of 
the circle by calculating 

Area = nR 2 . 

The area is a function of the radius, and we’ll call 
that function /: 

f(x) = ttx 2 . 

When you measure the radius R you will make 
an error, simply because you can never measure any¬ 
thing with infinite precision. Suppose that R is the 
real value of the radius, and that x is the number you 
measured. Then the size of the error you made is 

error in radius measurement = \x — i?|. 

When you compute the area you also won’t get the 
exact value: you would get f(x) = nx 2 instead of 
A = f(R) = nR 2 . The error in your computed value 
of the area is 

error in area = \f(x) - f(R)\ = \f(x) - A\. 


Now you can ask the following question: 

Suppose you want to know the area 
with an error of at most e, 
then what is the largest error 
that you can afford to make 
when you measure the radius? 

The answer will be something like this: if you want 
the computed area to have an error of at most 
\f(x) — A\ < e, then the error in your radius mea¬ 
surement should satisfy \x — R\ < 5. You have to do 
the algebra with inequalities to compute 5 when you 
know e, as in the examples in this section. 

You would expect that if your measured radius 
x is close enough to the real value R, then your com¬ 
puted area /( x) = nx 2 will be close to the real area 
A. 

In terms of £ and S this means that you would 
expect that no matter how accurately you want to 
know the area (i.e how small you make e) you can 
always achieve that precision by making the error 
in your radius measurement small enough (i.e. by 
making <5 sufficiently small). 


As x approaches 1 the factor \x — 1 | becomes small, and if the other factor \x + 1| were a constant (e.g. 2 as 
in the previous example) then we could find S as before, by dividing e by that constant. 

Here is a trick that allows you to replace the factor \x + 1| with a constant. We hereby agree that we 
always choose our 5 so that 6 < 1. If we do that, then we will always have 

\x — 1 | < S < 1 , i.e. \x — 1| < 1 , 


and x will always be beween 0 and 2. Therefore 

\x 2 — 1| = \x + 1| • \x — 1| < 3|x — 1|. 

If we now want to be sure that \x 2 — 1 | < e, then this calculation shows that we should require 3|cc — 1 | < £, 
i.e. |a: — 1| < \e. So we should choose S < \e. We must also live up to our promise never to choose S > 1, so 
if we are handed an e for which > 1, then we choose ( 5=1 instead of 5 = To summarize, we are going 
to choose 

(5 = the smaller of 1 and -e. 

3 

We have shown that if you choose 6 this way, then \x — 1 | < S implies |a: 2 — 1 1 < e, no matter what £ > 0 is. 

The expression “the smaller of a and 6” shows up often, and is abbreviated to min(a, b). We could 
therefore say that in this problem we will choose S to be 

(5 = min(l, |e). 
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2.4. Show that lim^^l/x = 1/4. Solution: We apply the definition with a = 4, L = 1/4 and 
f(x) = l/x. Thus, for any e > 0 we try to show that if \x — 4| is small enough then one has | f(x) — 1/4| < e. 

We begin by estimating |/(x) — || in terms of \x — 4|: 


I/(*) - 1/4| = 


1 1 


4 — x 

|x — 4| 

x 4 


4x 

|4x| 


4|- 


As before, things would be easier if l/|4x| were a constant. To achieve that we again agree not to take 5 > 1. 
If we always have <5 < 1, then we will always have \x — 4| < 1, and hence 3 < x < 5. How large can l/|4x| be 
in this situation? Answer: the quantity l/|4x| increases as you decrease x, so if 3 < x < 5 then it will never 
be larger than 1/|4 • 3| = 

We see that if we never choose 5 > 1, we will always have 


|/(z)- \\ < ^|x-4| for |x — 4| < (5. 

To guarantee that |/(x) — \\ < e we could threfore require 

A|x —4|<e, i.e. |x — 4| < 12e. 

Hence if we choose S = 12e or any smaller number, then |x — 4| < S implies |/(x) — 4| < e. Of course we have 
to honor our agreement never to choose 5 > 1, so our choice of <5 is 


5 = the smaller of 1 and 12e = min(l, 12e). 


3. Exercises 


38. Group Problem. 

Joe offers to make square sheets of paper for Bruce. 
Given x > 0 Joe plans to mark off a length x and cut 
out a square of side x. Bruce asks Joe for a square with 
area 4 square foot. Joe tells Bruce that he can't measure 
exactly 2 foot and the area of the square he produces will 
only be approximately 4 square foot. Bruce doesn't mind 
as long as the area of the square doesn't differ more than 
0.01 square foot from what he really asked for (namely, 4 
square foot). 

(a) What is the biggest error Joe can afford to make 
when he marks off the length x? 

(b) Jen also wants square sheets, with area 4 square 
feet. However, she needs the error in the area to be less 
than 0.00001 square foot. (She's paying). 

How accurate must Joe measure the side of the 
squares he’s going to cut for Jen? 

Use the £-5 definition to prove the following limits 

39. lim 2x — 4 = 6 

x —y 1 

40. lim x 2 = 4. 

x—*2 

41. lim x 2 — 7x + 3 = —7 

x—*2 

42. lim x 3 = 27 

x—*3 


43. 

lim x 3 + 6x 2 = 

x—>2 

44. 

lim -Jx = 2. 

x— 

45. 

lim \Jx + 6 = 1 

x—±2> 

46. 

+ X = \ 
m->2 4 + X 2 

47. 

,. 2 — x j 

i™ 4 — x = 3 

48. 

lim X - 1. 


49. lim \/\x\ = 0 

X —► () 

50. Group Problem. 

(Joe goes cubic.) Joe is offering to build cubes of 
side x. Airline regulations allow you take a cube on board 
provided its volume and surface area add up to less than 33 
(everything measured in feet). For instance, a cube with 
2 foot sides has volume+area equal to 2 3 + 6 x 2 2 = 32. 


If you ask Joe to build a cube whose volume plus 
total surface area is 32 cubic feet with an error of at 
most £, then what error can he afford to make when he 
measures the side of the cube he's making? 


51. Our definition of a derivative in (7) contains a limit. 
What is the function “/” there, and what is the variable? 


4. Variations on the limit theme 

Not all limits are “for x —> a.” here we describe some possible variations on the concept of limit. 
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4.1. Left and right limits. When we let “x approach a” we allow x to be both larger or smaller than 
a, as long as x gets close to a. If we explicitly want to study the behaviour of f(x) as x approaches a through 
values larger than a, then we write 

lim fix) or lim f(x) or lim /(x) or lim /( x). 

x\ia >-a+ x—>a +0 x—>a,x>a 

All four notations are in use. Similarly, to designate the value which f(x) approaches as x approaches a 
through values below a one writes 

lim /(x) or lim /(x) or lim /(x) or lim fix). 

x/*a x—>a— x — ya —0 x—>a,x<a 

The precise definition of right limits goes like this: 

4.2. Definition of right-limits. Let f be a function. Then 

(9) lim f{x) = L. 

x^a 

means that for every e > 0 one can find a 6 > 0 such that 

a < x < a + 6 => |/(x) — L\ < e 

holds for all x in the domain of f. 

The left-limit, i.e. the one-sided limit in which x approaches a through values less than a is defined in a 
similar way. The following theorem tells you how to use one-sided limits to decide if a function f{x) has a 
limit at x = a. 


4.3. Theorem. If both one-sided limits 

lim f{x) = L + , and lim fix) = L_ 

x^a x /*a 

exist, then 

lim fix) exists L + = L_. 

x—ta 

In other words, if a function has both left- and right-limits at some x = a, then that function has a limit 
at x = a if the left- and right-limits are equal. 


4.4. Limits at infinity. Instead of letting x approach some finite number, one can let x become “larger 
and larger” and ask what happens to fix). If there is a number L such that fix) gets arbitrarily close to L 
if one chooses x sufficiently large, then we write 

lim fix) = L , or lim fix) = L, or lim fix) = L. 

x—too arfoo x /*oo 

(“The limit for x going to infinity is L. v ) 


4.5. Example Limit of 1/x . 

Therefore, it seems reasonable to say 


Here is the precise definition: 


The larger you choose x, the smaller its reciprocal 1/x becomes. 

lim — = 0. 

x—>oo x 


4.6. Definition of limit at oo. Let f be some function which is defined on some interval xo < x < oo. 
If there is a number L such that for every e > 0 one can find an A such that 

x > A => | fix) — L\ < £ 

for all x, then we say that the limit of fix) for x —> oo is L. 

The definition is very similar to the original definition of the limit. Instead of 8 which specifies how close 
x should be to a, we now have a number A which says how large x should be, which is a way of saying “how 
close x should be to infinity.” 
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4.7. Example — Limit of 1/x (again) . To prove that liim^oo 1 / x = 0 we apply the definition to 
f(x) = 1/x, L = 0. 

For given e > 0 we need to show that 


( 10 ) 


l —L 

x 


< e for all x > A 


provided we choose the right A. 

How do we choose A? A is not allowed to depend on x, but it may depend on e. 

If we assume for now that we will only consider positive values of x, then (10) simplifies to 


which is equivalent to 


1 

< £ 
X 


X > 


This tells us how to choose A. Given any positive e, we will simply choose 

£ 

Then one has | - — 0| = - < £ for all x > A. Hence we have proved that lim x ._ > . 00 1/x = 0. 


5. Properties of the Limit 

The precise definition of the limit is not easy to use, and fortunately we won’t use it very often in this 
class. Instead, there are a number of properties that limits have which allow you to compute them without 
having to resort to “epsiloncy.” 

The following properties also apply to the variations on the limit from 4. I.e. the following statements 
remain true if one replaces each limit by a one-sided limit, or a limit for x —> oo. 


Limits of constants and of x. 

If a and c are constants, then 

(Pi) 

lim c = c 

x—*a 

and 


(P 2 ) 

lim x = a. 


Limits of sums, products and quotients. Let F\ and F 2 be two given functions whose limits for 


I4»we know, 

lim Fi(x) = Li, lim F 2 (x) = L 2 

Then 


{Pa) 

lim (Fi{x) + F 2 (x)) = Li + L 2 , 

x—*a 

(Pa) 

lim (F\ [x) - F 2 (x)) = Li - L 2 , 

x—ta 

(P 5 ) 

lim (Fi(x) ■ F 2 (x)) = Li- L 2 

x—>a 

Finally, if lim x _> a F 2 (x) 0, 


(Pe) 

r Fi(x) L 1 
lim — = —. 

X^a F 2 (X) L 2 


In other words the limit of the sum is the sum of the limits, etc. One can prove these laws using the 
definition of limit in §2 but we will not do this here. However, I hope these laws seem like common sense: 
if, for x close to a, the quantity Fi(x) is close to L\ and F 2 (x) is close to L 2 , then certainly Fi(x) + F 2 (x) 
should be close to L\ + L 2 . 
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There are two more properties of limits which we will add to this list later on. They are the “Sandwich 
Theorem” (§9) and the substitution theorem (§10). 


6. Examples of limit computations 


6.1. Find lim x ^.2 x 2 . One has 

lim x 2 = lim x ■ x 

x-s-2 x—>2 

= te*) ■ fe*) by ( Ps ) 

= 2-2 = 4. 

Similarly, 

lim a: 3 = lim x ■ x 2 

x->2 x—>-2 

= (lim x) ■ (lim x 2 ) (P 5 ) again 

v x^2 ’ ' 

= 2-4 = 8, 

and, by (P 4 ) 

lim x 2 — 1 = lim x 2 — lim 1 = 4 — 1 = 3, 

x —>-2 x—>2 x—*2 

and, by (P 4 ) again, 

lim a; 3 — 1 = lim x 3 — lim 1 = 8 — 1 = 7, 

x—>2 x—±2 x — y2 

Putting all this together, one gets 

a; 3 — 1 2 3 — 1 8-1 7 

X™ x 2 - 1 _ 2^1 ~ 4 - 1 ~~ 3 

because of (Pe). To apply (P 6 ) we must check that the denominator (“T 2 ”) is not zero. Since the denominator 
is 3 everything is OK, and we were allowed to use (Pe). 


6.2. Try the examples 1.3 and 1.4 using the limit properties. To compute lim^-^(£ 2 
4) we first use the limit properties to find 


lim x 2 — 2x = 0 and lim x 2 — 4 = 0. 

£—>•2 x —>2 


2x)/(x 2 — 


to complete the computation we would like to apply the last property (Pq) about quotients, but this would 
give us 


lim /(x) 

x —*2 


0 

o' 


The denominator is zero, so we were not allowed to use (Pq) (and the result doesn’t mean anything anyway). 
We have to do something else. 


The function we are dealing with is a rational function, which means that it is the quotient of two 
polynomials. For such functions there is an algebra trick which always allows you to compute the limit even 
if you first get jj. The thing to do is to divide numerator and denominator by x — 2. In our case we have 

x 2 — 2x = (x — 2) • x, x 2 — 4 = (x — 2) • (x + 2) 


so that 


lim fix) 

x —>2 


(x-2)-x 

lim -- ; -- 

x —>2 (x — 2) • (a; + 2) 


x 


lim 

x->2 X + 


2 ' 


After this simplification we can use the properties (P .) to compute 


lim f(x) 

x —*2 


2 

2 + 2 


1 

2' 
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6.3. Example — Find yfx . Of course, you would think that lim a; _ ) .2 yfx = y/2. and you can 

indeed prove this using S & e (See problem 44.) But is there an easier way? There is nothing in the limit 
properties which tells us how to deal with a square root, and using them we can’t even prove that there is a 
limit. However, if you assume that the limit exists then the limit properties allow us to find this limit. 

The argument goes like this: suppose that there is a number L with 

lim yfx = L. 

x ^>2 

Then property (P 5 ) implies that 

L 2 = (lim fx ) • (lim = lim fx ■ fx = lim x = 2. 

v a ;->2 ' U-s -2 ' x ->2 x-s -2 

In other words, L 2 = 2, and hence L must be either y/2 or —y/2. We can reject the latter because whatever x 
does, its squareroot is always a positive number, and hence it can never “get close to” a negative number like 
->/ 2 . 

Our conclusion: if the limit exists, then 

lim fx = y/2. 

x ^2 

The result is not surprising: if x gets close to 2 then fx gets close to y/2. 


6.4. Example 


The derivative of fx at x = 2. Find 


lim 

£—>2 


yfx - y/2 
X — 2 


assuming the result from the previous example. 

Solution: The function is a fraction whose numerator and denominator vanish when x = 2, i.e. the limit 
is of the form ?. We use the same algebra trick as before, namely we factor numerator and denominator: 

y/x — y/2 yfx — y/2. 1 

X — 2 (fx — f2)(fx ~{~ f2) yfx f 2 

Now one can use the limit properties to compute 

y/x — y/2 1 1 y/2 

lim --— = inn - — = —- = ——. 

x ~^ 2 x —>2 yjx + y/2 2 y 2 4 


6.5. Limit as x —> oo of rational functions. A rational function is the quotient of two polynomials, 
so 


(ii) 


R(x) 


a n x n + ■ ■ ■ + a\X + do 
b m x m H-+ bix + b 0 


We have seen that 

lim - = 0 

£—>•00 X 

We even proved this in example 4.7. Using this you can find the limit at oo for any rational function R(x) as 
in (11). One could turn the outcome of the calculation of lim^-^oo R(x) into a recipe/formula involving the 
degrees n and m of the numerator and denominator, and also their coefficients dj, bj , which students would 
then memorize, but it is better to remember “the trick.” 

To find lim-E^oo R[x) divide numerator and denominator by x m (the highest power of x occurring in the 
denominator). 

For example, let’s compute 

3a : 2 + 3 

lim —3 -. 

5x 2 + 7a; — 39 
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Remember the trick and divide top and bottom by x 2 , and you get 


lim 


3x 2 + 3 


= lim 


3 + 3/a 


oo 5x 2 + 7x — 39 x^oo 5 + 7/x — 39/x 2 
lim x ).oo 3 + 3/x 2 


lim^; -xx) 

3 

5 


5 + 7/x — 39/x 2 


Here we have used the limit properties (P*) to break the limit down into little pieces like lim^-Kao 39/x 2 
which we can compute as follows 


lim 39/x 2 = lim 39 • f ^ = f lim 39^ • ( lim 

x—>oo x—>oo \ X / \x—}oo J \X —>OC 


- | = 39 • 0 2 = 0. 

£—>•00 X 


6.6. Another example with a rational function . Compute 

T* 

lim 


x->oo x 3 + 5 

We apply “the trick” again and divide numerator and denominator by x. This leads to 


lim 


= lim 


1 /x 2 


1 -tco X 3 + 5 x^t-oo 1+5/ X 3 linia^oo 1 + t)/X' 


li m a;-).oo 1/^E _ 0 

1 + 5/x 3 1 


To show all possible ways a limit of a rational function can turn out we should do yet another example, 
but that one belongs in the next section (see example 7.6.) 


7. When limits fail to exist 

In the last couple of examples we worried about the possibility that a limit lim a; _ >a 3 (x) actually might 
not exist. This can actually happen, and in this section we’ll see a few examples of what failed limits look 
like. First let’s agree on what we will call a “failed limit.” 

7.1. Definition. If there is no number L such that lim X ^ a f(x) = L, then we say that the limit 
/(x) does not exist. 

7.2. The sign function near x = 0 . The “sign function 1 ” is defined by 

{ —1 for x < 0 
0 for x = 0 
1 for x > 0 

Note that “the sign of zero” is defined to be zero. But does the sign function have a limit at x = 0, i.e. does 
lim x _>.o sign(x) exist? And is it also zero? The answers are no and no, and here is why: suppose that for 
some number L one had 

lim sign(x) = L, 

x— s-0 

then since for arbitrary small positive values of x one has sign(x) = +1 one would think that L = +1. But 
for arbitrarily small negative values of x one has sign(x) = —1, so one would conclude that L = —1. But one 
number L can’t be both +1 and —1 at the same time, so there is no such L, i.e. there is no limit. 

lim sign(x) does not exist. 

x —>0 

^Some people don’t like the notation sign(ir), and prefer to write 

sW ' H 

instead of g(x) = sign(:r). If you think about this formula for a moment you’ll see that sign(fc) = x/\x\ for all x ^ 0. When 
x = 0 the quotient x/\x\ is of course not defined. 
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y = sign(x) 





r 

' 


Figure 1. The sign function. 


In this example the one-sided limits do exist, namely, 

lim sign(i) = 1 and lim sign(x) = — 1. 
x\,o x yo 

All this says is that when x approaches 0 through positive values, its sign approaches +1, while if x goes to 0 
through negative values, then its sign approaches —1. 


7.3. The example of the backward sine. Contemplate the limit as x —f 0 of the “backward sine,” 

i.e. 

lim sin(—). 

x->0 X x' 

When x = 0 the function f(x) = sin(7r/x) is not defined, because its definition involves division by x. 
What happens to f(x) as x —> 0? First, 7r/x becomes larger and larger (“goes to infinity”) as x -» 0. Then, 
taking the sine, we see that sin(7r/a:) oscillates between +1 and —1 infinitely often as x —> 0. This means 
that /( x) gets close to any number between —1 and +1 as x —f 0, but that the function /( x) never stays 
close to any particular value because it keeps oscillating up and down. 



Figure 2. Graph of y = sin - for —3 < x < 3, x ^ 0. 


Here again, the limit lim^^o /(x) does not exist. We have arrived at this conclusion by only considering 
what f[x) does for small positive values of x. So the limit fails to exist in a stronger way than in the example 
of the sign-function. There, even though the limit didn’t exist, the one-sided limits existed. In the present 
example we see that even the one-sided limit 

lim sin — 

x\,0 x 


does not exist. 
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7.4. Trying to divide by zero using a limit. The expression 1/0 is not defined, but what about 

lim — ? 

x->0 x 

This limit also does not exist. Here are two reasons: 

It is common wisdom that if you divide by a small number you get a large number, so as x ~\ 0 the 
quotient \/x will not be able to stay close to any particular finite number, and the limit can’t exist. 

“Common wisdom” is not always a reliable tool in mathematical proofs, so here is a better argument. 
The limit can’t exist, because that would contradict the limit properties (Pi) • • • (Pq)- Namely, suppose that 
there were an number L such that 

lim — = L. 

x—>0 X 

Then the limit property (P 5 ) would imply that 

lim (—■ x) = ( lim —) • (lim x) = L ■ 0 = 0. 
x— >0'x 7 v x—>oi' v x—>0 7 

On the other hand • x = 1 so the above limit should be l! A number can’t be both 0 and 1 at the same 
time, so we have a contradiction. The assumption that lin+^o 1/a: exists is to blame, so it must go. 


7.5. Using limit properties to show a limit does not exist. The limit properties tell us how to 
prove that certain limits exist (and how to compute them). Although it is perhaps not so obvious at first 
sight, they also allow you to prove that certain limits do not exist. The previous example shows one instance 
of such use. Here is another. 

Property (P3) says that if both lim x _,. a g(x) and lim X ^ a h(x) exist then fini^a g(x) + h{x) also must 
exist. You can turn this around and say that if lin+^a g[x) + h{x) does not exist then either \im x ^ a g(x) or 
linxj.^a h(x) does not exist (or both limits fail to exist). 

For instance, the limit 

lim- x 

x ->0 x 

can’t exist, for if it did, then the limit 

lim — = lim (- x + x) = lim (- x) + lim x 

x —>0 x x— >o y x 7 x—>o v a: 7 x—>0 

would also have to exist, and we know lin+.^o - doesn’t exist. 


7.6. Limits at 00 which don’t exist. If you let x go to 00 , then x will not get “closer and closer” to 
any particular number L, so it seems reasonable to guess that 

lim x does not exist. 

x—>oo 

One can prove this from the limit definition (and see exercise 72). 

Let’s consider 

r x 2 + 2x — 1 

L = lim -. 

x—loo x + 2 

Once again we divide numerator and denominator by the highest power in the denominator (i.e. x) 

x + 2 — - 
L = lim —- 

x loo 1 + 2 jX 

Here the denominator has a limit (’tis 1), but the numerator does not, for if lini^oo x + 2 — -f existed then, 
since lim a; _ > . 00 (2 — \/x) = 2 exists, 


lim x = lim 

X—YOO X—tCX 

would also have to exist, and lin+^oo x doesn’t exist. 


P + 2-i)-( 2 -i) 
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So we see that L is the limit of a fraction in which the denominator has a limit, but the numerator does 
not. In this situation the limit L itself can never exist. If it did, then 

1 x + 2 — — 

lim (x + 2-) = lim i - j ff ■ (1 + 2/x) 


would also have to have a limit. 


x->oo 1 + 2/a 


8. What’s in a name? 


There is a big difference between the variables x and a in the formula 


lim 2a; + 1, 

x—>a 

namely a is a free variable , while a; is a dummy variable (or “placeholder” or a “bound variable.”) 

The difference between these two kinds of variables is this: 

• if you replace a dummy variable in some formula consistently by some other variable then the value 
of the formula does not change. On the other hand, it never makes sense to substitute a number for 
a dummy variable. 

• the value of the formula may depend on the value of the free variable. 

To understand what this means consider the example lim x _>. a 2x + 1 again. The limit is easy to compute: 

lim 2x + 1 = 2a + 1 . 

x—>a 


If we replace x by, say u (systematically) then we get 

lim 2 u + 1 

u—>a 

which is again equal to 2a + 1. This computation says that if some number gets close to a then two times 
that number plus one gets close to 2a + 1. This is a very wordy way of expressing the formula, and you 
can shorten things by giving a name (like x or u) to the number which approaches a. But the result of our 
computation shouldn’t depend on the name we choose, i.e. it doesn’t matter if we call it x or u. 

Since the name of the variable x doesn’t matter it is called a dummy variable. Some prefer to call x a 
bound variable, meaning that in 

lim 2x + 1 

x—ya 

the x in the expression 2x + 1 is bound to the x written underneath the limit - you can’t change one without 
changing the other. 

Substituting a number for a dummy variable usually leads to complete nonsense. For instance, let’s try 
setting x = 3 in our limit, i.e. what is 

lim 2 • 3 + 1 ? 

3 —yet 

Of course 2 • 3 + 1 = 7, but what does 7 do when 3 gets closer and closer to the number a? That’s a silly 
question, because 3 is a constant and it doesn’t “get closer” to some other number like a! If you ever see 3 
get closer to another number then it’s time to take a vacation. 

On the other hand the variable a is free: you can assign it particular values, and its value will affect the 
value of the limit. For instance, if we set a = 3 (but leave x alone) then we get 

lim 2x + 1 

x—>3 

and there’s nothing strange about that (the limit is 2 • 3 + 1 = 7, no problem.) You could substitute other 
values of a and you would get a different answer. In general you get 2a + 1. 
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9. Limits and Inequalities 


This section has two theorems which let you compare limits of different functions. The properties in 
these theorems are not formulas that allow you to compute limits like the properties (Pi)... (Pg) from §5. 
Instead, they allow you to reason about limits, i.e. they let you say that this or that limit is positive, or that 
it must be the same as some other limit which you find easier to think about. 

The first theorem should not surprise you - all it says is that bigger functions have bigger limits. 


9.1. Theorem. Let f and g be functions whose limits for x —> a exist, and assume that f(x) < g(x) 
holds for all x. Then 


lim f(x) < lim g(x). 

x—ta x—ta 


A useful special case arises when you set f{x) = 0. The theorem then says that if a function g never has 
negative values, then its limit will also never be negative. 

The statement may seem obvious, but it still needs a proof, starting from the e-5 definition of limit. This 
will be done in lecture. 


Here is the second theorem about limits and inequalities. 


9.2. The Sandwich Theorem. Suppose that 

f(x) < g(x) < h(x) 

(for all x) and that 

lim /( x) = lim h(x). 

x—ta x—>a 

Then 

lim f(x) = lim g(x) = lim h(x). 


The theorem is useful when you want to know the limit of g , and when you can sandwich it between 
two functions / and h whose limits are easier to compute. The Sandwich Theorem looks like the first theorem 
of this section, but there is an important difference: in the Sandwich Theorem you don’t have to assume that 
the limit of g exists. The inequalities / < g < h combined with the circumstance that / and h have the same 
limit are enough to guarantee that the limit of g exists. 



Figure 3. Graphs of |x|, — |cc| and xcos^ for —1.2 < x < 1.2 
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9.3. Example: a Backward Cosine Sandwich. The Sandwich Theorem says that if the function 
g{x) is sandwiched between two functions f{x) and h(x ) and the limits of the outside functions / and h exist 
and are equal, then the limit of the inside function g exists and equals this common value. For example 

— Id < a: cos — < \x\ 
x 

since the cosine is always between —1 and 1. Since 

lim — \x\ = lim |.t| = 0 

x —^0 x —>0 

the sandwich theorem tells us that 

lim x cos - = 0. 
z-s -0 x 

Note that the limit lim^^o cos(l/x) does not exist, for the same reason that the “backward sine” did not 
have a limit for x —> 0 (see example 7.3). Multiplying with x changed that. 


10. Continuity 

10.1. Definition. A function g is continuous at a if 

(12) lim g(x) = g{a) 

x—>a 

A function is continuous if it is continuous at every a in its domain. 

Note that when we say that a function is continuous on some interval it is understood that the domain 
of the function includes that interval. For example, the function /( x) = \/x 2 is continuous on the interval 
1 < x < 5 but is not continuous on the interval — 1 < x < 1. 


10.2. Polynomials are continuous. For instance, let us show that P( x) = x 2 + 3a; is continuous at 
x = 2. To show that you have to prove that 

lim P(x) = P( 2), 

x—>2 


i.e. 


lim x 2 

x —>2 


32 = 2 2 + 3 • 2. 


You can do this two ways: using the definition with e and S (i.e. the hard way), or using the limit properties 
(Pi)... (P 6 ) from §5 (just as good, and easier, even though it still takes a few lines to write it out do both!) 

10.3. Rational functions are continuous. Let R(x) = be a rational function, and let a be any 
number in the domain of P, i.e. any number for which Q(a) ^ 0. Then one has 

P(x) 


lim R(x) = lim 


Q{x) 

_ lim^_> a P(x ) 
lirn^a Q(x) 

_ P{a) 

Q(a) 

= R(a). 

This shows that R is indeed continuous at a. 


property (P 6 ) 

P and Q are continuous 


10.4. Some discontinuous functions. If lima, _> a g(x) does not exist, then it certainly cannot be equal 
to g{a ), and therefore any failed limit provides an example of a discontinuous function. 

For instance, the sign function g{x) = sign(x) from example ?? is not continuous at x = 0. 

Is the backward sine function g(x) = sin(l/:r) from example ?? also discontinuous at x = 0? No, it is 
not, for two reasons: first, the limit lmr^o sin(l/a:) does not exist, and second, we haven’t even defined the 
function g(x) at x = 0, so even if the limit existed, we would have no value <?(0) to compare it with. 
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10.5. How to make functions discontinuous. Here is a discontinuous function: 


/ 0) 


x 2 if a: ^ 3, 
47 if x = 3. 


In other words, we take a continuous function like gix) = ar, and change its value somewhere, e.g. at x = 3. 
Then 

hm/(x) =9^47 = /(3). 

x—>3 

The reason that the limit is 9 is that our new function f(x) coincides with our old continuous function g{x) 
for all x except x = 3. Therefore the limit of f(x) as x —> 3 is the same as the limit of g(x) as x —> 3, and 
since g is continuous this is g(3) = 9. 


10.6. Sandwich in a bow tie. We return to the function from example ??. Consider 


f(x) 


xcos (i) for 2 ^ 0 , 
0 for x = 0 


Then / is continuous at x = 0 by the Sandwich Theorem (see Example ??). 

If we change the definition of / by picking a different value at x = 0 the new function will not be 
continuous, since changing / at x = 0 does not change the limit lmr^o f(x). Since this limit is zero, /(0) = 0 
is the only possible choice of /(0) which makes / continuous at x = 0. 


11. Substitution in Limits 

Given two functions / and g one can consider their composition h(x) = f(g(x)). To compute the limit 

lim f(g(x)) 

x—ta 

we write u = g{x), so that we want to know 

lim f(u) where u = g{x). 

x—>a 

Suppose that you can find the limits 

L = lim g(x) and lim f(u) = M. 

x—>a u^L 

Then it seems reasonable that as x approaches a, u = g(x) will approach L, and f{g{x)) approaches M. 
This is in fact a theorem: 

11.1. Theorem. If linx^a g(x) = a, and if the function f is continuous at u = L, then 

lim f(g(x)) = lim f(u) = f(L). 

x^a u^L 

Another way to write this is 

lim f(g(x)) = /(lim g{x)). 

x—±a x—>a 
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11.2. Example: compute lim^-^ y/ x 3 — 3x 2 + 2. The given function is the composition of two func¬ 
tions, namely 

\/ x 3 — 3 2 + 2 = y/u, with u = x 3 — 3x 2 + 2, 
or, in function notation, we want to find lim x _*.3 h(x) where 

h(x) = f{g[x)), with g(x) = x 3 — 3x 2 + 2 and g(x) = yfx. 


Either way, we have 

lim x 3 — 3x 2 + 2 = 2 and lim y/u = \Pl. 

x—*3 u—>2 

You get the first limit from the limit properties (Pi).. . (P5). The second limit says that taking the square 
root is a continuous function, which it is. We have not proved that (yet), but this particular limit is the one 
from example 6.3. Putting these two limits together we conclude that the limit is y/2. 

Normally, you write this whole argument as follows: 

lim 'Jx 3 — 3x 2 + 2 = /lim a : 3 — 3a ; 2 + 2 = V2, 

x—*3 v a:—>-3 

where you must point out that f(x) = yfx is a continuous function to justify the first step. 

Another possible way of writing this is 

lim f x 3 — 3x 2 + 2 = lim y/u = x/2, 

x —y 3 u—> 2 

where you must say that you have substituted u = x 3 — 3x 2 + 2. 


Find the following limits. 


52. liirq(2x + 5) 

53. lim (2x + 5) 

x->7- 

54. lim ( 2x + 5) 

x—> — oo 

55. lim (x + 3 ) 2006 

x —^—4 

56. lim ( x + 3) 2007 

x—>—4 

57. lim (x + 3 ) 2007 

x—> — OO 


58. 

lim — 
£->• 1 

59. 

lim — 

tyi 

60. 

lim 

t^-i 

61. 

lim 

x —>-oo 

62. 

lim 

x —>-oo 

63. 

lim 

x—yoo 

64. 

lim 


r +1 - 2 

t 2 -1 

t 2 +t-2 
t 2 - 1 

t 2 +t-2 
t 2 - 1 


x 2 + 3 


x 2 + 4 

x 5 + 3 
x 2 + 4 

x 2 + 1 


( 2 x+ l ) 4 


x—>oo (3x 2 + l ) 2 


12. Exercises 


65. 


, (2-u + l) 4 

(3 v? + l ) 2 


66 . 


(2t + l) 4 


J™o (3 1 2 + l) 2 


67. What are the coordinates of the points labeled A, 
E in Figure 2 (the graph of y = sin 7 r/x). 


68. If limaj-Hj /(x) exists then / is continuous at x = a. 
True or false? 


69. Give two examples of functions for which lim x NO f{x) 
does not exist. 


70. Group Problem. 

If lintr-m f{x) and lim x _>o5(a:) both do not exist, 
then lim^-m (/(x) + g(x)) also does not exist. True or 
false? 


71. Group Problem. 

If lim x _»o f{x) and lim a: _ > o3(*) both do not exist, 
then liniz-jo (f(x)/g{x)) also does not exist. True or 
false? 


72. Group Problem. 

In the text we proved that lim x _ KX , 1=0. Show 
that this implies that lim^^oo x does not exist. Hint: 
Suppose lim^-^oo x = L for some number L. Apply the 

limit properties to lim^-^oo x ■ —. 

x 

\fx — 3 

73. Evaluate lim —--. Hint: Multiply top and bottom 

z->9 x — 9 

by y/x + 3. 
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Proving lim —-— = 1 

The circular wedge OAC contains the 
triangle OAC and is contained in the right 
triangle OAB. 

The area of triangle OAC is | sin (9. 

The area of circular wedge OAC is \d. 

The area of right triangle OAB is | tan#. 

Hence one has sin# < # < tan# for all 
angles 0 < # < n/2. 

O A 


i _ i 

74. Evaluate lim —- -. 

x ->2 x — 2 

jl_ i_ 

75. Evaluate lim —- —. 

x ^>2 x — 2 

76. A function / is defined by 

{ x 3 for x < — 1 

ax + b for — 1 < x < 1 
x 2 + 2 for x > 1 . 

where a and b are constants. The function / is continuous 
What are a and 6? 


77. Find a constant k such that the function 


/(*) = 


3x + 2 for x < 2 
x 2 + k for x > 2. 


is continuous. Hint: Compute the one-sided limits. 

78. Find constants a and c such that the function 

{ x 3 + c for x < 0 

ax + c 2 for 0 < x < 1 

arctanx for x > 1. 


is continuous for all x. 



13. Two Limits in Trigonometry 

In this section we’ll derive a few limits involving the trigonometric functions. You can think of them as 
saying that for small angles 6 one has 

sin# « 6 and cos# « 1 - ^# 2 . 

We will use these limits when we compute the derivatives of Sine, Cosine and Tangent. 


13.1. Theorem, lim- = 1. 

e->o # 

PROOF. The proof requires a few sandwiches and some geometry. 

We begin by only considering positive angles, and in fact we will only consider angles 0 < # < 7 t/ 2 . 

Since the wedge OAC contains the triangle OAC its area must be larger. The area of the wedge is \9 
and the area of the triangle is \ sin#, so we find that 

(13) 0 < sin# < # for 0 < # < 

The Sandwich Theorem implies that 

(14) lim sin# = 0. 

e\o 

Moreover, we also have 

(15) lim cos# = lim \/l — sin 2 # = 1. 

e\, o 0\o 


225 





Next we compare the areas of the wedge OAC and the larger triangle OAB. Since OAB has area \ tan0 
we find that 

0 < tan 9 

for 0 < 6 < |. Since tan 9 = we can multiply with cos 9 and divide by 9 to get 

„ sin 9 7 T 

cos 9 < —— for 0 < 9 < — 

6 2 

If we go back to (15) and divide by 9, then we get 

„ sin 9 
cos 9 < —— < 1 
0 

The Sandwich Theorem can be used once again, and now it gives 

lim — = 1 . 

e \o 9 

This is a one-sided limit. To get the limit in which 9 0, you use that sin0 is an odd function. □ 

13.2. An example. We will show that 


(16) 


1 — COS i 

lim-■=— 

e-»o 9 2 


This follows from sin 2 9 + cos 2 9 = 1. Namely, 

1 — cos 9 


1 1 — cos 2 9 


9 2 


1 + cos 9 9 2 

1 sin 2 9 


1 + cos 9 9 2 


1 


sin 0 


1 + cos 0(0 

We have just shown that cos0 —> 1 and —>• 1 as 0 —¥ 0, so (16) follows. 

14. Exercises 


Find each of the following limits or show that it does not 
exist. Distinguish between limits which are infinite and 
limits which do not exist. 

79. lim (two ways: with and without the double 

q-»o sma 
angle formula!) 


80. lim 


sin 3* 


81. lim 


o 2x 
tan 0 


0 -s-o 9 

_ , tan 4a 

82. lim -. 

a->o sm2a 

83. lim 1 ~ C0S X 

x—tO 

84. lim 

0—> 7T j 

85. lim 


a-t-O xsmx 

1 — sin 8 


e^-z/2 8 — n /2 

2x 3 + 3x 2 cos x 


(x + 2)3 


86. lim 


87. lim 811 ^ 2 !. 

x 2 

88. lim ~ cos 


89. lim 


o tan 3 x 
sin(® 2 ) 


->o 1 — cos x 
x — ? 


90. lim 

x^rir/2 COS X 

91. lim (x — 5)tan x. 

x—>tt/2 


92. lim 


cosx 


o X 2 + 9' 
sin a; 

93. lim -. 

x^tk X — 7 T 

.. .. sinx 

94. lim -:- . 

a:->-0 X + Sin X 

. .. sinx 

95. A = lim -. 


2 i ->0 1 — cos X 


96. B = lim 


x—yoo x 

COSX 


(■•) 

(!! again) 
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97. Is there a constant k such that the function 

, , _ J sin(l/x) for x yf 0 
[k for x = 0. 

is continuous? If so, find it; if not, say why. 

98. Find a constant A so that the function 

1 sin a: 

- for x 0 

2x r 

A when x = 0 


/(*) = 


99. Compute lim^-Kxj a; sin ^ and lim^^oo a; tan ^ . (Hint: 
substitute something). 

100. Group Problem. 

(Geometry &l Trig review) Let A n be the area of the 
regular n- gon inscribed in the unit circle, and let B n be 
the area of the regular n- gon whose inscribed circle has 
radius 1. 

(a) Show that A n < n < B n . 

(b) Show that 

Tt 2,7V 7V 

A n = — sin — and B n = n tan — 

2 n n 

(c) Compute lim^oo A n and lim^oo B n . 


Here is a picture of A 12 , B§ and 7r: 



On a historical note: Archimedes managed to com¬ 
pute AgQ and Bgg and by doing this got the most accurate 
approximation for n that was known in his time. See also: 

http://www-history.mcs.st-andrews.ac.uk/ 
HistTopics/Pi_through_the_ages.html 
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Derivatives (2) 


“Leibniz never thought of the derivative as a limit” 


http://www.gap-system.org/~history/Biographies/Leibniz.html 

In chapter 2 we saw two mathematical problems which led to expressions of the form 2. Now that we know 
how to handle limits, we can state the definition of the derivative of a function. After computing a few 
derivatives using the definition we will spend most of this section developing the differential calculus, which is 
a collection of rules that allow you to compute derivatives without always having to use basic definition. 

1. Derivatives Defined 


1.1. Definition. Let f be a function which is defined on some interval ( c,d ) and let a be some number 
in this interval. 


(17) 


The derivative of the function f at a is the value of the limit 

/'(„) = lim IMzIM, 


f is said to be differentiable at a if this limit exists. 

f is called differentiable on the interval ( c,d) if it is differentiable at every point a in ( c,d). 

1.2. Other notations. One can substitute x = a + h in the limit (17) and let h —> 0 instead of x —> a. 
This gives the formula 

f{a + h)~ /(a) 


(18) 


/'(a) = lim 

h—f 0 


Often you will find this equation written with x instead of a and Ax instead of h , which makes it look like 
this: 

f(x + Ax) - f(x) 


/'( x) = lim 
V ' Ax-U) 


Ax 


The interpretation is the same as in equation (6) from §4. The numerator /(x + Ax) — /(cc) represents the 
amount by which the function value of / changes if one increases its argument ce by a (small) amount Ace. If 
you write y = /( x) then we can call the increase in / 

Ay = f(x + Ax) - f(x), 

so that the derivative f'{x) is 

/'(cr) = lim 

Ai ->0 Ax 

Gottfried Wilhelm von Leibniz, one of the inventors of calculus, came up with the idea that one should 
write this limit as 

dy r A y 

— = mil — —, 
dx Ai->o Ace 

the idea being that after letting Ax go to zero it didn’t vanish, but instead became an infinitely small quantity 
which Leibniz called “dx.” The result of increasing x by this infinitely small quantity dx is that y = /(x) 
increased by another infinitely small quantity dy. The ratio of these two infinitely small quantities is what we 
call the derivative of y = /(x). 
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There are no “infinitely small real numbers,” and this makes Leibniz’ notation difficult to justify. In the 
20 th century mathematicians have managed to create a consistent theory of “infinitesimals” which allows you 
to compute with “dx and dy ” as Leibniz and his contemporaries would have done. This theory is called “non 
standard analysis.” We won’t mention it any further 1 . Nonetheless, even though we won’t use infinitely small 
numbers, Leibniz’ notation is very useful and we will use it. 


2. Direct computation of derivatives 


2.1. Example — The derivative of /( x) = x 2 is f'(x) = 2x . We have done this computation before 
in §2. The result was 

.. f(x + h) — f(x) (x + h) 2 — x 2 . 

f (x) = lim---= Inn- - -= lim (2x + h) = 2x. 

fc-s-o h h-*o h h-t o 


Leibniz would have written 


dx 2 
dx 


= 2x. 


2.2. The derivative of g{x) = x is g'(x ) = 1 . Indeed, one has 

// x g(x + h) — g(x) (x + h) — x h 

q'ix) = lim —-f = lim -- - 1 - = lim - = 1. 

h-yO h h- s-0 h 7i-K) h 


In Leibniz’ notation: 


dx i 
dx 


This is an example where Leibniz’ notation is most misleading, because if you divide dx by dx then you 
should of course get 1. Nonetheless, this is not what is going on. The expression is not really a fraction 
since there are no two “infinitely small” quantities dx which we are dividing. 


2.3. The derivative of any constant function is zero . Let k(x) = c be a constant function. Then 
we have 

k'(x) = lim + = lim = lim 0 = o 

h- s-0 h 7i->0 h h^O 

Leibniz would have said that if c is a constant, then 

dc 


dx 


= 0 . 


2.4. Derivative of x n for n = 1,2,3,... . To differentiate f(x) = x n one proceeds as follows: 

/'(a) = lim f(x) ~ m = lim ~ . 

X — a x—m x — a 

We need to simplify the fraction ( x n — a n )/(x — a). For n = 2 we have 


x 2 — a 2 


= x + a. 


For n = 1, 2, 3,... the geometric sum formula tells us that 


(19) 


x — a 


= x n ~ L + x n ~ z a + x n ~*a z 




^But if you want to read more on this you should see Keisler’s calculus text at 

http://www.math.wisc.edu/~keisler/calc.html 

I would not recommend using Keisler’s text and this text at the same time, but if you like math you should remember that it 
exists, and look at it (later, say, after you pass 221.) 
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If you don’t remember the geometric sum formula, then you could also just verify (19) by carefully multiplying 
both sides with x — a. For instance, when n = 3 you would get 

x x ( x 2 + xa + a 2 ) = a; 3 +ax 2 +a 2 x 

— a x (x 2 + xa + a 2 ) = —ax 2 —arx —a 3 

(x — a) x (x 2 + ax + a 2 ) = x 3 —a 3 


With formula (19) in hand we can now easily find the derivative of x n : 

r n — a n 

f'(a) = lim -- 

x-m x — a 

= lim {a :" -1 + x n ~ 2 a + x n ~ 3 a 2 + • • • + xa n ~ 2 + a"' 1 } 


x—>a 

= a n ~ 1 + a n ~ 2 a + a n ~ 3 a 2 + ■ ■ ■ + a a n ~ 2 +a n ~ 1 . 


Here there are n terms, and they all are equal to a" 1 , so the final result is 

/'(a) = na" -1 . 

One could also write this as f{x) = nx" -1 , or, in Leibniz’ notation 

dx 11 


dx 


= nx 


n —1 


This formula turns out to be true in general, but here we have only proved it for the case in which n is a 
positive integer. 


3. Differentiable implies Continuous 

3.1. Theorem. If a function f is differentiable at some a in its domain, then f is also continuous at a. 
Proof. We are given that 


lim 


f(x) - f(a) 


x — a 


exists, and we must show that 


lim f(x) = f (a). 


This follows from the following computation 

lim f(x) = lim (f(x) - /(a) + /(a)) 

x—>a x—>a 


r f(x) - /(a) , \ x ft \ 

— lim-• [x — a) + j (a) 


x^a x — a 

= ( lim /(l) - /(a) 
I x^a x — a 


lim (x — a) + lim /(a) 


=/'(a) • 0 + /(a) 

= /(a)- 


(algebra) 
(more algebra) 

(Limit Properties) 
(/'(a) exists) 


□ 


4. Some non-differentiable functions 
4.1. A graph with a corner. Consider the function 


fix) = \x\ = 


for x > 0 , 


—x for x < 0 . 

This function is continuous at all x, but it is not differentiable at x = 0. 
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To see this try to compute the derivative at 0, 

/'(0) = lim -—i-= lim -—- = lim sign(x). 

x —>o x — 0 x —>o x x —>o 

We know this limit does not exist (see §7.2) 

If you look at the graph of f(x) = |x| then you see what is wrong: the graph has a corner at the origin 
and it is not clear which line, if any, deserves to be called the tangent to the graph at the origin. 



4.2. A graph with a cusp. Another example of a function without a derivative at x = 0 is 

f{x) = s/\x\. 

When you try to compute the derivative you get this limit 

/'(0) = lim ^0 = ? 
x-tO X 

The limit from the right is 

lim = lim -L 

x\,0 X x\0 y/X 

which does not exist (it is “+oo”). Likewise, the limit from the left also does not exist (’tis oo). Nonetheless, 
a drawing for the graph of / suggests an obvious tangent to the graph at x = 0, namely, the y- axis. That 
observation does not give us a derivative, because the y-axis is vertical and hence has no slope. 


4.3. A graph with absolutely no tangents, anywhere. The previous two examples were about 
functions which did not have a derivative at x = 0. In both examples the point x = 0 was the only point where 
the function failed to have a derivative. It is easy to give examples of functions which are not differentiable 
at more than one value of x, but here I would like to show you a function / which doesn’t have a derivative 

anywhere in its domain. 

To keep things short I won’t write a formula for the function, and merely show you a graph. In this 
graph you see a typical path of a Brownian motion, i.e. t is time, and x(t) is the position of a particle which 
undergoes a Brownian motion - come to lecture for further explanation (see also the article on wikipedia). 
To see a similar graph check the Dow Jones or Nasdaq in the upper left hand corner of the web page at 
http: //f inance. yahoo. com in the afternoon on any weekday. 
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5. Exercises 


101. Compute the derivative of the following functions 

/O) = 

9(x) = 
k(x) = 
u(x) = 
v(x) = 
w(x ) = 

using either (17) or (18). 

102. Which of the following functions is differentiable at 

x = 0? 

f(x) = x\x\, g{x)=xy/\x\, 

2 7T 

h(x) = x + \x\, k{x) = x sin—, 

x 

£(x) = z sin —. 

x 

These formulas do not define k and t at x = 0. We define 

k{ 0) = £{0) = 0. 


x 2 — 2x 
1 
x 

x 3 — 17 * 
2 

1 + * 
i fx 

1 

\fx 


f(x) = 


103. For which value(s) is the function defined by 

I ax + b for * < 0 
\x — x 2 for x > 0 

differentiable at x = 0? Sketch the graph of the function 
/ for the values a and b you found. 

104. For which value(s) is the function defined by 

[ ax 2 + b for * < 1 


/(*) = 


for * > 1 


differentiable at * = 0? Sketch the graph of the function 
/ for the values a and b you found. 

105. For which value(s) is the function defined by 


/(*) = 


ax 2 for x < 2 
x + b for x > 2 


differentiable at * = 0? Sketch the graph of the function 
/ for the values a and b you found. 

106. Group Problem. 

True or false: If a function / is continuous at some 
x = a then it must also be differentiable at * = a? 

107. Group Problem. 

True or false: If a function / is differentiable at 
some x = a then it must also be continuous at * = a? 



Figure 3. A Brownian motion. Note how the graph doesn’t seem to 


have a a tangent anywhere at all. 
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6. The Differentiation Rules 


You could go on and compute more derivatives from the definition. Each time you would have to compute 
a new limit, and hope that there is some trick that allows you to find that limit. This is fortunately not 
necessary, ft turns out that if you know a few basic derivatives (such as dx n /dx = nx n ~ x ) the you can find 
derivatives of arbitrarily complicated functions by breaking them into smaller pieces. In this section we’ll 
look at rules which tell you how to differentiate a function which is either the sum, difference, product or 
quotient of two other functions. 





dc 


Constant rule: 

c' = 0 


dx 

= 0 

Sum rule: 

(u ± v)' = u' 

±v' 

du ± v 

du dv 

dx 



dx dx 

Product rule: 



duv 

du dv 

(u ■ v)' = v! 

■ V + u ■ v' 

dx 

= di v+u di 




d U 

du dv 

Quotient rule: 

fu\' u 1 

■ v — u ■ v' 

a 

V 

V dx U dx 

\v / 

V 2 

dx 

V 2 


Table 1 . The differentiation rules 


The situation is analogous to that of the “limit-properties” (Pi)... (Pq) from the previous chapter which 
allowed us to compute limits without always having to go back to the epsilon-delta definition. 


6.1. Sum, product and quotient rules. In the following c and n are constants, u and v are functions 
of x, and ' denotes differentiation. The Differentiation Rules in function notation, and Leibniz notation, are 
listed in figure 1 . 

Note that we already proved the Constant Rule in example 2.2. We will now prove the sum, product and 
quotient rules. 


6.2. Proof of the Sum Rule. Suppose that f(x) = u(x)+v(x) for all x where u and v are differentiable. 
Then 


lim /M - /(«) 

x^ra X ~ a 

(definition of /') 

(u(x) + v(x)) — (u(a) + v(a)) 

lim - - - - - - - 

x^-a x — a 

(use f = u + v) 

lim ( U ^ ~ U ^ 1 V ^ ~ V ^) 
x->a. y x — a x — a J 

(algebra) 

u(x) — u(a) v(x) — v(a ) 

lim w w + lim w w 

x—hi x — a x^a x — a 

(limit property) 

u'(a) + v'(a) 

(definition of uv') 


6.3. Proof of the Product Rule. Let f(x) = u(x)v(x). To find the derivative we must express the 
change of / in terms of the changes of u and v 

f(x) - f(a) = u(x)v(x) - u(a)v(a) 

= u(x)v(x) — u(x)v(a) + u(x)v(a) — u(a)v(a) 

= u(x)(y(x) — v(a)) + ( u(x ) — u(a))v(a) 
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Now divide by a : — a and let x —> a: 

f( x ) - /(«) 


v(x) -v(a) u(x)-u(a) 

= lim u(x)- 1 - via) 

x->a x — a x->a x — a x — a 


lim 


(use the limit properties) 


( lim u(x)) ( lim ■ 

\x—>a J \x—>a 


v(x) — v(a)\ / u(x) — u(a) 


x — a 




lim 

X^a x — a 


-)u(a) 


= u(a)v'(a) + u'(a)v(a), 
as claimed. In this last step we have used that 


u(x)-u(a) , v(x)~v(a) , 

lim -= u (a) and lim-= v (a) 


x-M x — a 


x—*a x — a 


and also that 


lim u(x) = u(a) 


This last limit follows from the fact that u is continuous, which in turn follows from the fact that u is 
differentiable. 


6.4. Proof of the Quotient Rule . We can break the proof into two parts. First we do the special 
case where f(x) = l/v(x) 1 and then we use the product rule to differentiate 

f( \ ( \ 1 

J\ x ) = TT = U\X) ‘ 
v ' v(x) v ’ v(x) 

So let f{x) = l/v(x). We can express the change in / in terms of the change in v 

1 1 v(x) — v(a ) 


Dividing by x — a we get 


f( x ) fi a ) v ^ v(x)v(a) 

f( x ) ~ f(a) 1 v(x) - v(a) 


x — a v(x)v(a ) x — a 

Now we want to take the limit x —> a. We are given the v is differentiable, so it must also be continuous and 
hence 

lim v{x) = v(a). 

x—>a 

Therefore we find 


Hm = lim 


1 


lim 


v(x) — v(a) v'(a) 


X^>a X — a rc-s-a v(x)v(a) x—>a x — a v(a) 2 

That completes the first step of the proof. In the second step we use the product rule to differentiate f = u/v 

TV u' v' 

= -= - 2 

V I V yt 


(U\' 

f V 

, 1 

- = 

u • - 

= U - - b U 

\v / 

V v) 

V 


6.5. A shorter, but not quite perfect derivation of the Quotient Rule . The Quotient Rule 
can be derived from the Product Rule as follows: if w = u/v then 

(20) w ■ v = u 

By the product rule we have 

w ■ v + w ■ v' = u , 

so that 

, v! — w ■ v’ u’ — (u/v) ■ v' u! ■ v — u ■ v' 
w = -=-=- 5 -• 

V V V 

Unlike the proof in §6.4 above, this argument does not prove that w is differentiable if u and v are. It only 
says that if the derivative exists then it must be what the Quotient Rule says it is. 

The trick which is used here, is a special case of a method called “implicit differentiation.” We have an 
equation ( 20 ) which the quotient w satisfies, and from by differentiating this equation we find w'. 
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6.6. Differentiating a constant multiple of a function . Note that the rule 

(cm )' = cu' 

follows from the Constant Rule and the Product Rule. 

6.7. Picture of the Product Rule. If u and v are quantities which depend on x, and if increasing x 
by Ax causes u and v to change by Am and Am, then the product of u and v will change by 

(21) A(mm) = (u + Am)(m + Am) — mm = mAm + vAu + AuAv. 

If u and v are differentiable functions of x, then the changes Am and Am will be of the same order of magnitude 
as Ax, and thus one expects AmAm to be much smaller. One therefore ignores the last term in (21), and thus 
arrives at 

A (mm) = mAm + vAu. 

Leibniz would now divide by Ax and replace A’s by d’s to get the product rule: 

A (mm) Am Am 

—a- = u ^-b v ^r~■ 

Ax Ax Ax 


uAv 


AuAv 



,vAu 


Figure 4. The Product Rule. How much does the area of a rectangle change if its sides u and v are 
increased by A u and Av? Most of the increase is accounted for by the two thin rectangles whose areas 
are mAm and mAu. So the increase in area is approximately uAv + vAu, which explains why the product 
rule says ( uv)' = uv' + vu'. 


7. Differentiating powers of functions 

7.1. Product rule with more than one factor. If a function is given as the product of n functions, 
i.e. 

/(x) = Mi(x) X M2 (x) X • • • X M n (x), 

then you can differentiate it by applying the product rule n — 1 times (there are n factors, so there are n — 1 
multiplications.) 

After the first step you would get 

/' = u [ (m 2 ■ ■ ■ u n ) + Mi (m 2 • • • u n )'. 

In the second step you apply the product rule to (m 2 M 3 • • • u n )'. This yields 

/' = U[U2 ■ ■ ■ U n + Ml [m^M 3 ■ • • M n + M 2 (m 3 • • • U n )'] 

= u[u2 ■ ■ ■ Un + U 1 U 2 U 3 ■ ■ ■ U n + MiM 2 (m 3 ■ ■ ■ U n )'. 
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Continuing this way one finds after n — 1 applications of the product rule that 
(22) (ill ’ ’ ' U n )' = U[U 2 ■ ■ ■ U n + Uiu' 2 U 3 ■■■U n + ■■■ + U 1 U 2 U 3 ■ ■ ■ u' n . 


7.2. The Power rule . If all n factors in the previous paragraph are the same, so that the function / 
is the 72 th power of some other function, 

f{x) = (■ u(x )) n , 

then all terms in the right hand side of (22) are the same, and, since there are n of them, one gets 

f'(x) = nu n ~ 1 (x)u'(x), 

or, in Leibniz’ notation, 


(23) 


du n 

dx 


du 

dx 


7.3. The Power Rule for Negative Integer Exponents . We have just proved the power rule (23) 
assuming n is a positive integer. The rule actually holds for all real exponents n, but the proof is harder. 

Here we prove the Power Rule for negative exponents using the Quotient Rule. Suppose n = — to where 
to is a positive integer. Then the Quotient Rule tells us that 

1 V Q.R. ( u m )' 


<«")'==(y q = ■ - 


(u m ) 2 ' 


Since to is a positive integer, we can use (23), so (u m )' = mu™ 1 , and hence 

(«")' = - 


,2m 


= -mu-™- 1 ■ v! = nu n ~ Y v!. 


7.4. The Power Rule for Rational Exponents . So far we have proved that the power law holds if 
the exponent n is an integer. 

We will now see how you can show that the power law holds even if the exponent n is any fraction, 
n = p/q. The following derivation contains the trick called implicit differentiation which we will study in 
more detail in Section 15. 

So let n = p/q where p and q are integers and consider the function 

w(x) = u( x) p ^ q . 

Assuming that both u and w are differentiable functions, we will show that 
(24) w'(x) = -u(x)^~ 1 u'(x) 

Raising both sides to the <?th power gives 

w(x) q = u(x) p . 

Here the exponents p and q are integers, so we may apply the Power Rule to both sides. We get 

gw 9-1 • w' = pv?- 1 ■ v!. 

Dividing both sides by qw q ^ 1 and substituting u p ^ q for w gives 

, pvP- 1 ■ u’ puP- 1 ■ u' puP- 1 ■ v! p ( / )_ 1 , 

qwi- 1 qu p ( 9—!)/<? qu p-(p/q) q 

which is the Power Rule for n = p/q. 

This proof is flawed because we did not show that w(x) = u{x) p ^ q is differentiable: we only showed what 
the derivative should be, if it exists. 
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7.5. Derivative of x n for integer n . If you choose the function u(x) in the Power Rule to be u(x) = x, 
then u'(x) = 1 , and hence the derivative of f(x) = u(x) n = x n is 

f(x) = nu{x) n ~ 1 u'(x) = nx n ~ l ■ 1 = na:" -1 . 

We already knew this of course. 

7.6. Example — differentiate a polynomial . Using the Differentiation Rules you can easily differ¬ 
entiate any polynomial and hence any rational function. For example, using the Sum Rule, the Power Rule 
with u(x) = x , the rule (cu)' = cu! , the derivative of the polynomial 

f(x) = 2x 4 — x 3 + 7 
is 

f'(x) = 8x 3 — 3x 2 . 


7.7. Example — differentiate a rational function. By the Quotient Rule the derivative of the 
function 

, . 2x 4 — x 3 + 7 

= l + x> 
is 


9\x) = 


( 8 ar — 3ar)(l + x 2 ) — (2x 4 — x 3 + 7)2 
(1 + x 2 ) 2 

6 a ; 5 — x 4 + 8x 3 — 3a ; 2 — 14a; 


( 1 +a ; 2 ) 2 

If you compare this example with the previous then you see that polynomials simplify when you differentiate 
them while rational functions become more complicated. 

7.8. Derivative of the square root . The derivative of /( x) = ypx = x 1 ^ 2 is 

1 1 


f{x) = -ar'- * = -x 


It 1 / 2 " 1 = - 
2 2 

where we used the power rule with n = 1/2 and u(x) = x. 


= t r ~ i / 2 = _ 

2 2 2X 1 / 2 2^ 


8. Exercises 


108. Let /(x) = (x 2 + l)(x 3 + 3). Find f'(x) in two ways: 

(a) by multiplying and then differentiating, 

(b) by using the product rule. 

Are your answers the same? 

109. Let /(x) = (1 + x 2 ) 4 . Find f(x) in two ways, first by 
expanding to get an expression for f(x) as a polynomial 
in x and then differentiating, and then by using the power 
rule. Are the answers the same? 

110. Prove the statement in §6.6, i.e. show that (cu) 1 = 
c(u') follows from the product rule. 

Compute the derivatives of the following functions. 

(try to simplify your answers) 

111 . /(x) = x + 1 + (x + l) 2 

112 . f(x) = 4^- 

J x 4 + l 

113 - /(^(rtb )- 1 

114. /(x) = \J 1 — x 2 


115. 

f{x) = 

ax + b 

cx + d 

116. 

/(*) = 

1 

(1 + X 2 ) 2 

117. 

/(*) = 

X 

1 + y/x 

118. 

/(*) = 

/1 — X 

V 1 + x 

119. 

fix) = 

\J x + yfx 

120. 

<p{t) = 

t 

1 + y/t 

121. 

g( s ) = 


122. 

Kp) = 

\J p + \fp 

123. 

Group Problem. 
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Using derivatives to approximate numbers. 

(a) Find the derivative of f(x) = x 4 / 3 . 

(b) Use (a) to estimate the number 

127 4/ 3 _ 125 4/3 
2 

approximately without a calculator. Your answer should 
have the form p/q where p and q are integers. [Hint: 
Note that 5 3 = 125 and take a good look at equation 

(17)-] 

(c) Approximate in the same way the numbers VT43 and 
%/l45 (Hint: 12 x 12 = 144). 

124. Group Problem. 

(Making the product and quotient rules look nicer.) 
Instead of looking at the derivative of a function you can 
look at the ratio of its derivative to the function itself, i.e. 
you can compute f'/f. This quantity is called the log¬ 
arithmic derivative of the function / for reasons that 
will become clear later this semester. 

(a) Compute the logarithmic derivative of these func¬ 
tions (i.e. find f'(x)/f(x)) 

F(x) = x, g(x) = 3x, h(x) = x 2 
k(x) = —x 2 , i(x) = 2007a: 2 , m(x) = a: 2007 


(b) Show that for any pair of functions u and v one 
has 


(uv)' _ u’ v' 

uv u v 

(u/v)' u' v 1 

u/v u v 


125. (a) Find f'(x) and g'(x) if 

Note that f(x) = l/g(x). 

(b) Is it true that f'(x) = 1/ g’(x)l 

(c) Is it true that f(x) = g~ 1 (x)7 

(d) Is it true that f(x) = g(x ) _1 ? 


2a: 4 + 7 
1 + x 2 


126. Group Problem, (a) Let x(t) = (1 — t 2 )/(l + t 2 ), 
y(t ) = 2t/(l -Ft 2 ) and u(t ) = y(t)/x(t). Find dx/dt, 
dy/dt. 

(b) Now that you’ve done (a) there are two different ways 
of finding du/dt. What are they, and use one of both to 
find du/dt. 


9. Higher Derivatives 


9.1. The derivative is a function. If the derivative f(a) of some function / exists for all a in the 
domain of /, then we have a new function: namely, for each number in the domain of / we compute the 
derivative of / at that number. This function is called the derivative function of /, and it is denoted by /'. 
Now that we have agreed that the derivative of a function is a function, we can repeat the process and try to 
differentiate the derivative. The result, if it exists, is called the second derivative of f. It is denoted /". 
The derivative of the second derivative is called the third derivative, written and so on. 

The nth derivative of / is denoted f( n \ Thus 


/ (0) =/, / (1) = /', / (2) =/", / (3) • 


Leibniz’ notation for the nth derivative of y = f(x) is 


cTy 

dx n 




9.2. Example. If f(x) = x 2 — 2x + 3 then 

/( x) = x 2 — 2x + 3 
f'(x ) = 2x - 2 
/"(*) = 2 
f (3 \x) = 0 
/ (4) (z)= 0 


All further derivatives of / are zero. 
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9.3. Operator notation. A common variation on Leibniz’ notation for derivatives is the so-called 

operator notation , as in 

d(x 3 — x) d , o . „ , 

- - = — (x 3 - x ) = 3a; 2 - 1. 

ax ax 

For higher derivatives one can write 

fv = f ±\ 2 

dx 2 \dx J ^ 

Be careful to distinguish the second derivative from the square of the first derivative. Usually 

dx 2 \dx J 


10. Exercises 


127. The equation 


130. Find f'(x), f"(x) and /®(x) if 


2 a: _ 1 1 

a; 2 — 1 x + 1 ^ x — 1 


3 4 5 6 

p / \ . iv iv iv tiy 

f(x) = 1 + X + - + - + - + — + 


holds for all values of x (except x = ±1), so you should 333 Group Problem 

get the same answer if you differentiate both sides. Check , . 

this. , ( a > Find the 12 " ' 

Compute the third derivative of /( x) = 2 x/(x 2 — 1) x + 2 ' 

by using either the left or right hand side (your choice) , 

(b) Find the n on 


(a) Find the 12 th derivative of the function f(x) = 


128. Compute the first, second and third derivatives of the 
following functions 

f(x) = (x + l) 4 
g(x) = (a: 2 + l) 4 
h(x) = \/x — 2 


k(x) =^x~- 


129. Find the derivatives of 10 th order of the functions 


f(x) = x 12 + x 8 


h(x) = 


9 ( x ) = ~ 
k{x) = 


(b) Fi nd the n th order derivative of /(x) = — — (i.e. 

find a formula for f (n) (x) which is valid for all n = 

0 , 1 , 2 , 3 ...). 

(c) Find the n th order derivative of g(x) = — 

132. (About notation.) 

(a) Find dy/dx and d 2 y/dx 2 if y = x/(x + 2). 

(b) Find du/dt and d 2 u/dt 2 if u = t/(t + 2). Hint: 
See previous problem. 

(C ) Find ±(-?-) a nd ^ ■ Hint: 

w dx \x + 2J dx 2 \x + 2j 

See previous problem. 

( d ) Find 4z ( -7-^ I and 4z (rFrl ■ 


dx \ x + 2 


dx \1 + 2 


133. Find d 2 y/dx 2 and (dy/dx) 2 if y = x 3 . 


11. Differentiating Trigonometric functions 

The trigonometric functions Sine, Cosine and Tangent are differentiable, and their derivatives are given 
by the following formulas 

d sinx dcosx . d tanx 1 

(25) —,— = cos x, —,— = - sin a:, —,-= —. 

dx dx dx cos 2 x 

Note the minus sign in the derivative of the cosine! 


PROOF. By definition one has 


.... sm(x + h) — sin(x) 

sin (x) = lim ---. 

h-> o h 


To simplify the numerator we use the trigonometric addition formula 

sin(a + /?) = sin a cos 0 + cos a sin 0. 
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with a = x and 0 = h, which results in 


sin(x + h) — sin(a:) sin(ar) cos (h) + cos(:r) sin(/i) — sin (a:) 

h h 

, sin(ft) . . .cos(h) — 1 
= cos(*)——-b sm(i)--- 


Hence by the formulas 


from Section 13 we have 


lim = 1 

h- s -0 h 


, cos (h) — 1 

and lim ---= 0 

h — >-0 h 


. , , sin ft . . cos(/i) —1 

sin (x) = lim cos(x) —- -b sin (x) - -- 

h^o h h 

= cos(x) ■ 1 + sin(x) • 0 
= cos(x). 


A similar computation leads to the stated derivative of cosx. 
To find the derivative of tan x we apply the quotient rule to 


sma; 

tan x = - 

cos a; 


/ 0 ) 
g{x)' 


We get 


as claimed. 


tan ; (x) = 


cos(cc) sin'(a;) — sin (a:) cos'(a;) cos 2 (a;) + sin 2 (a;) 


1 


cos 2 (a;) 


cos 2 (a;) 


cos 2 (a:) 


□ 


12. Exercises 


Find the derivatives of the following functions (try to 
simplify your answers) 

134. f(x) = sin(a;) + cos(a:) 

135. f(x) = 2sin(*) — 3cos(a:) 

136. f(x) = 3 sin(a:) + 2 cos(a:) 

137. f(x) = a;sin(a;) + cos(a;) 

138. f(x) = xcos(x) — sinx 

139. f(x) = 

140. f(x) = cos 2 (*) 

141. /(*) = \J 1 — sin 2 x 


142. /(*) = 


1 — sin x 
1 + sin x 


. cos* 

143. cot(*) = —-. 

sint 

144. Can you find a and b so that the function 

cos * for * < K 


/(*) = 


is differentiable at * = 7 t/4? 

145. Can you find a and b so that the function 

tan* for * < 5 


/(*) = 


a + bx for * > ^ 


a + bx for * > 


is differentiable at * = 7 t/4? 

146. If / is a given function, and you have another function 
g which satisfies g(x) = /(*) + 12 for all *, then / and g 
have the same derivatives. Prove this. [Hint: it's a short 
proof - use the differentiation rules.] 

147. Group Problem. 

Show that the functions 

/(*) = sin 2 * and g(x) = — cos 2 * 
have the same derivative by computing /'(*) and g'(x). 
With hindsight this was to be expected - why? 

148. Find the first and second derivatives of the functions 

/(*) = tan 2 * and g(x) = ——. 

cos 2 x 

Hint: remember your trig to reduce work! 
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A depends on B depends on C depends on... 


Someone is pumping water into a balloon. Assuming 
that the balloon is spherical you can say how large it 
is by specifying its radius R. For a growing balloon 
this radius will change with time t. 

The volume of the balloon is a function of its ra¬ 
dius, since the volume of a sphere of radius r is given 
by 



We now have two functions, the first / turns tells 
you the radius r of the balloon at time t, 

r = f(t) 

and the second tells you the volume of the balloon 
given its radius 

V = g(r). 

The volume of the balloon at time t is then given by 


i.e. the function which tells you the volume of the 
balloon at time t is the composition of first / and 
then g. 

Schematically we can summarize this chain of 
cause-and-effect relations as follows: you could either 
say that V depends on r, and r depends on t, 


or you could say that V depends directly on t: 


time t 


radius r 
(depends | 
on 

time t) 


volume 

V 

(depends 

on 

radius 

r) 


time t 



volume V 
(depends on 
time t) 


Figure 5. A “real world example” of a composition of functions. 


13. The Chain Rule 

13.1. Composition of functions. Given two functions / and g, one can define a new function called 
the composition of f and g. The notation for the composition is / o g, and it is defined by the formula 

f°9(x) = f(g(x)). 

The domain of the composition is the set of all numbers x for which this formula gives you something 
well-defined. 

For instance, if fix) = x 2 + x and g(x) = 2x + 1 then 

/ o g(x) = /( 2x + l) = (2x + l ) 2 + (2x + 1 ) 
and g o f{x) = g(x 2 + x) = 2(x 2 + x) + 1 

Note that fog and g o f are not the same fucntion in this example (they hardly ever are the same). 

If you think of functions as expressing dependence of one quantity on another, then the composition of 
functions arises as follows. If a quantity z is a function of another quantity y 1 and if y itself depends on x, 
then z depends on x via y. 

To get fog from the previous example, we could say z = f{y) and y = g(x), so that 

2 = f(y) = y 2 +y and y = 2x + l. 

Give x one can compute y, and from y one can then compute z. The result will be 

z = y 2 + y — {2x + l ) 2 + (2.x + 1 ), 

in other notation, 

^ = f(y) = /(p(a0) = f° g(x)- 

One says that the composition of f and g is the result of subsituting g in f. 
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13.2. Theorem (Chain Rule). If f and g are differentiable, so is the composition fog. 

The derivative of f o g is given by 

(f°g)\x) = f\g{x)) g'{x). 

The chain rule tells you how to find the derivative of the composition / o g of two functions / and g 
provided you now how to differentiate the two functions / and g. 

When written in Leibniz’ notation the chain rule looks particularly easy. Suppose that y = y(x) and 
z = f(y), then z = f o g(x), and the derivative of z with respect to x is the derivative of the function / o g. 
The derivative of z with respect to y is the derivative of the function /, and the derivative of y with respect 
to x is the derivative of the function g. In short, 

§ = (/ ° 1 ' ) ' <X) - Ty= nv) ‘“ ,d S =S ' (l) 

so that the chain rule says 

_ . dz dz dy 

(26) S = Ty £ 

First proof of the chain rule (using Leibniz’ notation). We first consider difference quotients 
instead of derivatives, i.e. using the same notation as above, we consider the effect of an increase of x by an 
amount Ax on the quantity z. 

If x increases by Ax, then y = y(x) will increase by 

Ay = g{x + Ax) - g{x), 

and 2 = /(y) will increase by 

= f{y + Ay) - f(y). 

The ratio of the increase in z = /(y(x)) to the increase in x is 

Az A z Ay 
Ax Ay Ax 

In contrast to dx, dy and dz in equation (26), the Ax, etc. here are finite quantities, so this equation is just 
algebra: you can cancel the two Ays. If you let the increase Ax go to zero, then the increase Ay will also go 
to zero, and the difference quotients converge to the derivatives, 

Az dz Az dz Ay dy 

Ax dx ’ Ay dy ’ Ax dx 

which immediately leads to Leibniz’ form of the quotient rule. □ 


PROOF of the CHAIN RULE. We verify the formula in Theorem 13.2 at some arbitrary value x = a, i.e. 
we will show that 

( f°g)\a ) = f(g(a)) g'(a). 


By definition the left hand side is 

(/ ° g)'(a) = I™ 

x—ta 


( f°g)(x ) - (f°g)(a) 

x — a 


lim /(gO)) - f(g{a)) 

X^HL x — a 


The two derivatives on the right hand side are given by 

,, \ g(x)-g(a) 

g (a) = lim - 


and 


f'{g(a )) = lim 

y—*a 


f(y ) - /(g(«)) 

y- g{a) 
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Since g is a differentiable function it must also be a continuous function, and hence lim X ^ a g(x) = g{a). So 
we can substitute y = g(x) in the limit defining f(g(a)) 


(27) 


/'(<,(«)) = Um M - = , inl /«*» - /(»<«)) 


v^ a y-g(a) 


x->a g( x ) - g(a) 


Put all this together and you get 


(f°9)\a) = lim 


= lim 


/(g0*0) - /(g(g)) 

x — a 

f(g(x)) - /(g( a )) g(x) - g(a) 


x^a g(x) - g(a) 


r f(g( x )) - /(g(a)) 

= lim -—-—-• lim 


x — a 

g(x) - g(a) 


n(x) — q(a) x^-a x — a 

= f\g{a)) ■ g'(a) 

which is what we were supposed to prove - the proof seems complete. 

There is one flaw in this proof, namely, we have divided by g{x) — g(a), which is not allowed when 
g(x) — g(a) = 0. This flaw can be fixed but we will not go into the details here . 2 □ 

13.3. First example. We go back to the functions 

^ = f(y ) =y 2 + y and y = g(x) = 2x + 1 
from the beginning of this section. The composition of these two functions is 

2 = f(g(x)) = ( 2x + l) 2 + (2x + 1) = 4x 2 + 6a; + 2. 

We can compute the derivative of this composed function, i.e. the derivative of z with respect to x in two 
ways. First, you simply differentiate the last formula we have: 

. . dz d( 4a; 2 + 6x + 2) 

(28) — = — 

dx 

The other approach is to use the chain rule: 


dx 

dz _ d{y 2 + y) 


= 8 a; + 6 . 


and 

Hence, by the chain rule one has 
(29) 


dy dy 

dy d(2x + 1 ) 


— 2y + 1, 


dx 


dx 


= 2 . 


dz dz dy . „. „ 

— = — / = (2j/ + l -2 = 4y + 2. 
dx dy dx 


The two answers (28) and (29) should be the same. Once you remember that y = 2x + \ you see that this is 
indeed true: 

y = 2x + 1 => 4y + 2 = 4(2a; + 1) + 2 = 8x + 6 . 

The two computations of dz/dx therefore lead to the same answer. In this example there was no clear 
advantage in using the chain rule. The chain rule becomes useful when the functions / and g become more 
complicated. 


2 Briefly, you have to show that the function 


Hy) 


f{/(2/)-/(s( a ))}/(3/-s(“)) V + a 
\/'(s(«)) y = a 


is continuous. 
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13.4. Example where you really need the Chain Rule. We know what the derivative of sin x with 
respect to x is, but none of the rules we have found so far tell us how to differentiate f(x) = sin( 2 a;). 

The function f(x) = sin2x is the composition of two simpler functions, namely 
f(x) = g(h{x)) where g(u ) = sinw and h(x) = 2x. 

We know how to differentiate each of the two functions g and h: 

g'{u) = cos u, h\x) = 2 . 

Therefore the chain rule implies that 

f{x) = g'(h(x))h'(x) = cos( 2 x) • 2 = 2 cos 2 an 

Leibniz would have decomposed the relation y = sin 2x between y and x as 

y = sin it, u = 2x 

and then computed the derivative of sin 2x with respect to x as follows 

d sin 2 a; u = 2 x dsinu dsinu du 

- - - = —-- = —-— • — = cos u- 2 = 2 cos 2x. 

dx dx du dx 

13.5. The Power Rule and the Chain Rule. The Power Rule, which says that for any function / 
and any rational number n one has 

^{f(x) n ) =nf(x) n ~ 1 f'{x), 

is a special case of the Chain Rule, for one can regard y = f(x) n as the composition of two functions 

y = g(u), u = f(x ) 

where g(u) = u n . Since g'(u) = nu n ~ 1 the Chain Rule implies that 

du n du n du n-\d u 

dx du dx dx 

Setting u = f(x) and ^ = f'(x) then gives you the Power Rule. 


13.6. The volume of an inflating balloon. Consider the “real world example” from page 53 again. 
There we considered a growing water balloon of radius 


r = f{t). 


The volume of this balloon is 

V = r 3 = ^7T/(t) 3 . 

We can regard this as the composition of two functions, V = g(r) = f 7 rr 3 and r = f(t). 
According to the chain rule the rate of change of the volume with time is now 

dV dV dr 
dt dr dt 


i.e. it is the product of the rate of change of the volume with the radius of the balloon and the rate of change 
of the balloon’s radius with time. From 


dV 

dr 



= 47 t r 2 


we see that 


dV n d r 
—— = 4n r — . 
dr dt 


For instance, if the radius of the balloon is growing at 0.5inch/sec, and if its radius is r = 3.0inch, then the 
volume is growing at a rate of 


-y- = 47r(3.0inch) 2 x 0.5inch/sec ~ 57inch 3 /sec. 
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13.7. A more complicated example. Suppose you needed to find the derivative of 

v = h(x) = ^ 

V ^ ’ (y/x + l + l) 2 

We can write this function as a composition of two simpler functions, namely, 

y = /(«), u = g{x), 


with 


The derivatives of / and g are 

/'(«) = 


/(«) = -— l ^—2 and g(x) = y/x + 1 

(w + l) 2 

1 • (w + l) 2 — u ■ 2{u + 1) u+ 1 — 2 

(u+1) 4 (w + 1) 3 


and 


Hence the derivative of the composition is 


g'(x) 


1 

2y/X -f" 1 


U — 1 

(W + l) 3 ’ 


h'(x) 


d J y/x + 1 ^ 

dz \ ( v / aT+T + l ) 2 J 


f'(u)g'(x) 


u — 1 

(u + l) 3 


1 

2\/x T 1 


The result should be a function of x, and we achieve this by replacing all u’s with u = y/x + 1: 

d J \/a; +1 ^ \/a; +1 — 1 1 

dx \ (y/x + 1 + l) 2 j (\/x + 1 + l) 3 2y/x + 1 

The last step (where you replace u by its definition in terms of x) is important because the problem was 
presented to you with only x and y as variables while u was a variable you introduced yourself to do the 
problem. 


Sometimes it is possible to apply the Chain Rule without introducing new letters, and you will simply 
think “the derivative is the derivative of the outside with respect to the inside times the derivative of the 
inside.” For instance, to compute 

d 4 + y/7 + x 3 
dx 

you could set u = 7 + x 3 , and compute 


d 4 + y/7 + x 3 d 4 + yfu du 
dx du dx 

Instead of writing all this explicitly, you could think of u = 7 + x 3 as the function “inside the square root,” 
and think of 4 + y/u as “the outside function.” You would then immediately write 


d 

dx 


(4+y/7 + x 3 ) 


1 

2V7 + x 3 


3x 2 . 


13.8. The Chain Rule and composing more than two functions. Often we have to apply the 
Chain Rule more than once to compute a derivative. Thus if y = f(u ), u = g(v), and v = h(x) we have 

dy dy du dv 
dx du dv dx 

In functional notation this is 


(fogo h)’(x) = f'(g(h(x)) ■ g'(h(x)) ■ h\x). 


Note that each of the three derivatives on the right is evaluated at a different point. Thus if b = h{a) and 
c = g(b) the Chain Rule is 


dy_ 

dx 


dy 

du 

dv 

x=a du 

dv 

u=c 

v=b dx 
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For example, if y = 


1 + V9 + a: 2 


then y = 1/(1 + u) where u = 1 + y/v and v = 9 + x 2 so 


dy dy du dv 1 1 

dx du dv dx (1 + u ) 2 2 y/v 


so 


dy 

dy 

du 

dv 

dx 

x=4 du 

«= 6 dv 

u=25 dx 


c=4 


2x. 

1 1 
7 ' 10 


14. Exercises 


149. Let y = y/l + x 3 and find dy/dx using the Chain Rule. 
Say what plays the role of y = f(u) and u = g{x). 

150. Repeat the previous exercise with 

V = (1 + Vl + x) 3 . 

151. Alice and Bob differentiated y = y/l + x 3 with respect 
to x differently. Alice wrote y = y/u and u = 1 + x 3 while 
Bob wrote y = y/l + v and v = x 3 . Assuming neither 
one made a mistake, did they get the same answer? 

152. Let y = u 3 + 1 and u = 3x + 7. Find — and —. 

dx du 

Express the former in terms of x and the latter in terms 
of u. 

153. S uppose that f{x) = yfx, g(x) = 1 + x 2 , v(x) = 
f o g(x), w(x) = g o f(x). Find formulas for v(x), w(x), 
v'(x), and w'(x). 

Compute the following derivatives 


154. 

f(x) 

= sin 2x — cos 3x 

155. 

f(x) 

. 7V 

= sin — 

X 

156. 

/(») 

= sin(cos 3x) 

157. 

fix) 

sinx 2 

X 2 

158. 

fix) 

= tan y/l + x 2 

159. 

fix) 

= cos 2 X — cos x 2 

160. 

Group Problem. 


Moe is pouring water into a glass. At time t (sec¬ 
onds) the height of the water in the glass is h(t) (inch). 
The ACME glass company, which made the glass, says 
that the volume in the glass to height h is V = 1.2 h 2 
(fluid ounces). 

(a) The water height in the glass is rising at 2 inch 
per second at the moment that the height is 2 inch. How 
fast is Moe pouring water into the glass? 

(b) If Moe pours water at a rate of 1 ounce per 
second, then how fast is the water level in the glass going 
up when it is 3 inches? 

(c) Moe pours water at 1 ounce per second, and at 
some moment the water level is going up at 0.5 inch per 
second. What is the water level at that moment? 


161. Find the derivative of fix) = x cos / at the point C 
in Figure 3. 

162. S uppose that f(x) = x 2 + 1, g(x) = x + 5, and 

v = fog, w = gof, p = f-g , q = g-f- 

Find v(x), w(x), p(x), and q(x). 

163. Group Problem. 

Suppose that the functions / and g and their deriva¬ 
tives with respect to x have the following values at x = 0 
and x = 1. 


X 

fix ) 

gix) 

fix) 

g'ix) 

0 

1 

l 

5 

1/3 

1 

3 

-4 

-1/3 

-8/3 


Define 

v(x) = f(g(x)), w(x) = g(f(x)), 

P(x) = f(x)g(x), q(x) = g(x)f(x). 

Evaluate v(0), u>(0), p(0), q( 0), v'(0) and w'(0), p'( 0), 

q'( 0). If there is insufficient information to answer the 
question, so indicate. 

164. A differentiable function / satisfies /(3) = 5, /(9) = 7, 
/'(3) = 11 and /'(9) = 13. Find an equation for 
the tangent line to the curve y = f(x 2 ) at the point 

(x,y) = ( 3,7). 

165. There is a function / whose second derivative satisfies 

(f) f"(x) = —64/(x). 

(a) One such function is f(x) = sin ax, provided 
you choose the right constant a: Which value should a 
have? 

(b) For which choices of the constants A, a and b 
does the function f(x) = Asin(ax + b) satisfy (f)? 

166. Group Problem. 

A cubical sponge, hereafter refered to as ‘Bob', is 
absorbing water, which causes him to expand. His side 
at time t is S(t). His volume is V(t). 

(a) What is the relation between S(t) and V(t), i.e. 
can you find a function / so that V{t) = f(S(t))7 

(b) Describe the meaning of the derivatives S'(t) and 
V'(t) in one plain english sentence each. If we measure 
lengths in inches and time in minutes, then what units 

do t,S(t),V(t),S'(t) and V'(t) have? 
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(c) What is the relation between S'(t ) and V'(t)7 

(d) At the moment that Bob's volume is 8 cubic 
inches, he is absorbing water at a rate of 2 cubic inch per 
minute. How fast is his side S(t) growing? 


15. Implicit differentiation 


15.1. The recipe. Recall that an implicitely defined function is a function y = f(x) which is defined 
by an equation of the form 

F(x,y) = 0. 

We call this equation the defining equation for the function y = f{x). To find y = f(x) for a given value of 
x you must solve the defining equation F(x,y) = 0 for y. 

Here is a recipe for computing the derivative of an implicitely defined function. 


(1) Differentiate the equation F[x , y) = 0; you may need the chain rule to deal with the occurences of y 
in F(x, y)\ 

(2) You can rearrange the terms in the result of step 1 so as to get an equation of the form 


(30) 


G{x ’ y) tx 


+ H(x,y) = 0, 


( 3 ) 


where G and Ft are expressions containing x and y but not the derivative. 

dy 

Solve the equation in step 2 for —: 

dx 


(31) 


dy _ H(x,y ) 
dx G(x,y) 


(4) If you also have an explicit description of the function (i.e. a formula expressing y = /( x) in terms 
of x) then you can substitute y = f(x) in the expression (31) to get a formula for dy/dx in terms of 
x only. 

Often no explicit formula for y is available and you can’t take this last step. In that case (31) is 
as far as you can go. 


Observe that by following this procedure you will get a formula for the derivative -F which contains both x 
and y. 


15.2. Dealing with equations of the form F\{x,y) = la) a;, y). If the implicit definition of the 
function is not of the form F(x , y) = 0 but rather of the form F\(x , y) = 1 * 2 ( 2 :, y) then you move all terms to 
the left hand side, and proceed as above. E.g. to deal with a function y = /( x) which satisfies 

y 2 + x = xy 

you rewrite this equation as 

y 2 + x — xy = 0 

and set F(x, y) = y 2 + x — xy. 


15.3. Example — Derivative of \j\ — x A . Consider the function 

/( x) = \/1 — x 4 , —1<X<1. 

We will compute its derivative in two ways: first the direct method, and then using the method f implicit 
differentiation (i.e. the recipe above). 


247 





The direct approach goes like this: 


f\x) = 


d(l-x 4 ) 1/4 


dx 


= Ui-x t y z, 4 d(:i T x ‘ ) 

4 v 7 dx 

™3 


(i-x 4 ) 3/4 

To find the derivative using implicit differentiation we must first find a nice implicit description of the 
function. For instance, we could decide to get rid of all roots or fractional exponents in the function and 
point out that y = \J\ — x 4 satisfies the equation y 4 = 1 — x 4 . So our implicit description of the function 

V = f{%) = v 7 ! - x 4 is 

x 4 + y 4 — 1 = 0; The defining function is therefore F(x, y) = x 4 + y 4 — 1 

Differentiate both sides with respect to x (and remember that y = f{x), so y here is a function of x), and 
you get 

dx 4 


0 __ 4x , + 4 ,|/ =0 
dx dx dx dx 


The expressions G and H from equation (30) in the recipe are G(x,y ) = 4 y 3 and H(x,y) = 4a; 3 . 

This last equation can be solved for dy/dx : 

dy x 3 

dx y 3 

This is a nice and short form of the derivative, but it contains y as well as x. To express dy/dx in terms of x 
only, and remove the y dependency we use y = \/l — x 4 . The result is 

f' (x ) - d JL - _ 3:3 

n) dx y 3 (x-^) 3 / 4 ' 

15.4. Another example. Let / be a function defined by 

y = f(x) ■<==> 2y + siny = x, i.e. 2y + siny — x = 0. 

For instance, if x = 2 tt then y = tt, i.e. f{2ir) = n. 

To find the derivative dy/dx we differentiate the defining equation 


d(2y + siny — x) 
dx 


dO 

dx 


„ dy dy dx 

2-f- +cos y-f- - — = 0 
dx dx dx 


(2 + cos y)^- — 1 = 0. 
dx 


Solve for g and you get 


f(x) = 


1 


1 


2 +cosy 2 + cos/(x) 

If we were asked to find /'(2-7r) then, since we know /(27 t) = n, we could answer 

/'( 2tt) = --- = - 1 — = 1. 

v ’ 2 + costt 2-1 

If we were asked /'( tt/2), then all we would be able to say is 

W 2) = 1 


2 + cos /(7 t/2) ’ 

To say more we would first have to find y = /(7 t/ 2), which one does by solving 


2 y + sin y = 
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15.5. Derivatives of Arc Sine and Arc Tangent. Recall that 

y = arcsin x x = sin y and — f < y < f, 


and 


y = arctan x -4=> x = tan y and — ^ < y < |. 


15.6. Theorem. 

d arcsin x 1 

dx y/\ - x 2 

d arctan x 1 

dx 1 + x 2 


PROOF. If y = arcsin a: then x = siny. Differentiate this relation 

dx d sin y 
dx dx 

and apply the chain rule. You get 

1 = % 

and hence 

dy = 1 

dx cos y 

How do we get rid of the y on the right hand side? We know x = sin y, and also — f < y < §. Therefore 

sin 2 y + cos 2 y = 1 => cos y = ±\J 1 — sin 2 y = ± \/l — x 2 . 

Since — f < y < f we know that cosy > 0, so we must choose the positive square root. This leaves us with 
cos y = Vi — x 2 , and hence 

dy_ = 1 

dx Vi - x 2 ' 

The derivative of arctan x is found in the same way, and you should really do this yourself. □ 

16. Exercises 


For each of the following problems find the derivative 
f'{x) if y = f(x) satisfies the given equation. State what 
the expressions F(x,y), G(x,y) and H(x,y) from the 
recipe in the beginning of this section are. 

If you can find an explicit description of the function 
y = f(x), say what it is. 


167. 

7T 

xy — — 
y 6 



168. 

sin(a:y) = 

_ i 
_ 2 


169. 

xy 

1 



x + y 



170. 

x + y = 

xy 


171. 

(y- i) 2 

+ x = 

■- 0 

172. 

(v + 1) 2 

+ y- 

x = 0 

173. 

( y - x ) 2 

+ X = 

= 0 

174. 

(v + xf 

+ 2 y 

- x = 0 

175. 

(y 2 - 1) 

2 + x 

= 0 


176. (y 2 + l) 2 — x = 0 

177. x 3 + xy + y 3 = 3 

178. sin x + sin y = 1 

179. sin x + xy + y 5 = 7r 

180. tan x + tan y = 1 

For each of the following explicitly defined functions 
find an implicit definition which does not involve taking 
roots. Then use this description to find the derivative 

dy/dx. 

181. y = f(x) — Vi - x 

182. y = f[x ) = Vx + x 2 

183. y = f{x) = V 1 ~ V x 

184. y = f(x) = V x ~ V x 

185. y — f(x) = V%x +T — x 2 

186. y = f(x) = V x + x 2 
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y = f{x) = \]x- v / 2®TT 


187. 


188. y = f(x) = 

189. Group Problem. 

(Inverse trig review) Simplify the following expres¬ 
sions, and indicate for which values of x (or 8, or ...) 
your simplification is valid. In case of doubt, try plotting 
the function on a graphing calculator. 


(a) sin arcsin a; 

(b) cos arcsin x 

(c) arctan(tan^) 

(d) cot arctan x 


(e) tan arctan z 
(0 tan arcsin 8 

(g) arcsin(sin 8) 

(h) cot arcsin x 


Now that you know the derivatives of arcsin and 
arctan, you can find the derivatives of the following func- 


tions. What are they? 

190. 

/ O) 

= arcsin(2a;) 

191. 

/O) 

= arcsin y/x 

192. 

f(x) 

= arctan(sin x) 

193. 

f{x) 

= sin arctan x 

194. 

f(x) 

= (arcsin a:) 2 

195. 

f(x) 

1 

1 + (arctan a;) 2 

196. 

f{x) 

= y/l — (arcsin a:) 2 

197. 

f(x) 

arctan x 

arcsin x 


PROBLEMS ON 
RELATED RATES 

198. A 10 foot long pole has one end ( B ) on the floor and 
another (yl) against a wall. If the bottom of the pole is 
8 feet away from the wall, and if it is sliding away from 
the wall at 7 feet per second, then with what speed is the 
top (A) going down? 



b(t) 


199. A pole 10 feet long rests against a vertical wall. If the 
bottom of the pole slides away from the wall at a speed of 
2 ft/s, how fast is the angle between the top of the pole 
and the wall changing when the angle is 7t/4 radians? 


200. A pole 13 meters long is leaning against a wall. The 
bottom of the pole is pulled along the ground away from 
the wall at the rate of 2 m/s. How fast is its height on 
the wall decreasing when the foot of the pole is 5 m away 
from the wall? 

201. Group Problem. 

A television camera is positioned 4000 ft from the 
base of a rocket launching pad. A rocket rises vertically 
and its speed is 600 ft/s when it has risen 3000 feet. 

(a) How fast is the distance from the television cam¬ 
era to the rocket changing at that moment? 

(b) How fast is the camera’s angle of elevation chang¬ 
ing at that same moment? (Assume that the television 
camera points toward the rocket.) 

202. Group Problem. 

A 2-foot tall dog is walking away from a streetlight 
which is on a 10-foot pole. At a certain moment, the tip 
of the dogs shadow is moving away from the streetlight 
at 5 feet per second. How fast is the dog walking at that 
moment? 

203. An isosceles triangle is changing its shape: the lengths 
of the two equal sides remain fixed at 2 inch, but the 
angle 8(t) between them changes. 

Let A(t) be the area of the triangle at time t. If 
the area increases at a constant rate of 0.5inch 2 /sec, 
then how fast is the angle increasing or decreasing when 
8 = 60° ? 

204. A point P is moving in the first quadrant of the plane. 
Its motion is parallel to the a:-axis; its distance to the 
a>axis is always 10 (feet). Its velocity is 3 feet per second 
to the left. We write 8 for the angle between the positive 
a;-axis and the line segment from the origin to P. 

(a) Make a drawing of the point P. 

(b) Where is the point when 8 = tt/37 

(c) Compute the rate of change of the angle 8 at 
the moment that 8 = j. 

205. The point Q is moving on the line y = x with velocity 3 
m/sec. Find the rate of change of the following quantities 
at the moment in which Q is at the point (1,1): 

(a) the distance from Q to the origin, 

(b) the distance from Q to the point 77(2,0), 

(c) the angle ZORQ where R is again the point 
77(2,0). 

206. A point P is sliding on the parabola with equation 
y = x 2 . Its ^-coordinate is increasing at a constant rate 
of 2 feet/minute. 

Find the rate of change of the following quantities 
at the moment that P is at (3,9): 

(a) the distance from P to the origin, 

(b) the area of the rectangle whose lower left corner 
is the origin and whose upper right corner is P, 
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(c) the slope of the tangent to the parabola at P, 

(d) the angle Z.OPQ where Q is the point (0,3). 

207. Group Problem. 

A certain amount of gas is trapped in a cylinder with 
a piston. The ideal gas law from thermodynamics says 
that if the cylinder is not heated, and if the piston moves 
slowly, then one has 

pV = CT 

where p is the pressure in the gas, V is its volume, T 
its temperature (in degrees Kelvin) and C is a constant 
depending on the amount of gas trapped in the cylinder. 

(a) If the pressure is lOpsi (pounds per square inch), 
if the volume is 25inch‘\ and if the piston is moving so 


that the gas volume is expanding at a rate of 2inch 3 per 
minute, then what is the rate of change of the pressure? 

(b) The ideal gas law turns out to be only approxi¬ 
mately true. A more accurate description of gases is given 
by van der Waals’ equation of state, which says that 

(p+^- 2 )(v-b) = c 

where a, b, C are constants depending on the temperature 
and the amount and type of gas in the cylinder. 

Suppose that the cylinder contains fictitious gas for 
which one has a = 12 and b = 3. Suppose that at some 
moment the volume of gas is 12in \ the pressure is 25psi 
and suppose the gas is expanding at 2 inch 3 per minute. 
Then how fast is the pressure changing? 
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Graph Sketching and Max-Min Problems 


The signs of the first and second derivatives of a function tell us something about the shape of its graph. 
In this chapter we learn how to find that information. 

1. Tangent and Normal lines to a graph 

The slope of the tangent the tangent to the graph of / at the point (a, /(a)) is 

(32) m = /'(a) 
and hence the equation for the tangent is 

(33) y = f(a) + — a). 

The slope of the normal line to the graph is —1/m and thus one could write the equation for the normal as 

( 34 ) 

When f(a) = 0 the tangent is horizontal, and hence the normal is vertical. In this case the equation for the 
normal cannot be written as in (34), but instead one gets the simpler equation 

V = fip). 

Both cases are covered by this form of the equation for the normal 
(35) x = a + f(a)(f{a) - y). 

Both (35) and (34) are formulas that you shouldn’t try to remember. It is easier to remember that if the 



2. The Intermediate Value Theorem 

It is said that a function is continuous if you can draw its graph without taking your pencil off the paper. 
A more precise version of this statement is the Intermediate Value Theorem: 
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Figure 2. The Intermediate Value Theorem says that a continuous function must attain any given value 
y between f(a) and f(b) at least once. In this example there are three values of c for which /(c) = y 
holds. 


2.1. Intermediate Value Theorem. If f is a continuous function on an interval a < x < b, and if y 
is some number between f{a) and f(b), then there is a number c with a < c < b such that /(c) = y. 

Here “y between /(a) and /(&)” means that /(a) < y < f(b) if /(a) < /(6), and f(b) < y < f(a ) if 

f{b) < /(a). 

2.2. Example — Square root of 2. Consider the function f[x) = x 2 . Since /(1) < 2 and /( 2) = 4 > 2 
the intermediate value theorem with a=l,b = 2 : y = 2 tells us that there is a number c between 1 and 2 
such that /(c) = 2, i.e. for which c 2 = 2. So the theorem tells us that the square root of 2 exists. 

2.3. Example — The equation 0 + sin0 = Consider the function f{x) = x + sina:. It is a continuous 
function at all x, so from /( 0) = 0 and /( n) = tv it follows that there is a number 6 between 0 and tv such 
that f{9) = 7t/ 2. In other words, the equation 

(36) 9 + sinf? = ^ 

has a solution 9 with 0 < 9 < tv /2. Unlike the previous example, where we knew the solution was v2, there 
is no simple formula for the solution to (36). 

2.4. Example — Solving 1/x = 0. If we apply the intermediate value theorem to the function 
f(x) = 1/x on the interval [a,6] = [—1,1], then we see that for any y between /(a) = /(—1) = —1 and 
f(b) = /(1) = 1 there is a number c in the interval [—1,1] such that 1/c = y. For instance, we could choose 
y = 0 (that’s between —1 and +1), and conclude that there is some c with —1 < c < 1 and 1/c = 0. 

But there is no such c, because 1/c is never zero! So we have done something wrong, and the mistake we 
made is that we overlooked that our function f{x) = 1/x is not defined on the whole interval —1 < x < 1 
because it is not defined at x = 0. The moral: always check the hypotheses of a theorem before you use it! 

3. Exercises 


208. Where does the normal to the graph of y = x 2 at the 
point (1,1) intersect the z-axis? 

209. Where does the tangent to the graph of y = x 2 at the 
point (a, a 2 ) intersect the ir-axis? 

210. Where does the normal to the graph of y = x 2 at the 
point (a, a 2 ) intersect the *-axis? 

211. Where does the normal to the graph of y = \fx at 
the point (a, yfa) intersect the rr-axis? 


212. Does the graph of y = x 4 — 2x 2 + 2 have any horizontal 
tangents? If so, where? 

Does the graph of the same function have any vertical 
tangents? 

Does it have vertical normals? 

Does it have horizontal normals? 

213. At some point (a, /(a)) on the graph of f(x) = 
— 1 +2x — x 2 the tangent to this graph goes through the 
origin. Which point is it? 
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satisfies /(—1) = —2 and /(+1) = +2, so, by the Inter¬ 
mediate Value Theorem, there should be some value c 
between —1 and +1 such that /(c) = 0. True or False? 

216. Find the equation for the tangents to the graph of the 
Backward Sine at the points x = 1, x = 1 and at D (see 
Figure 2 in §7.3.) 

217. Find the equation for the tangent to the graph of the 
Backward Cosine in a Bow Tie at the point C (see Figure 
3 in §9.3) 

4. Finding sign changes of a function 

The intermediate value theorem implies the following very useful fact. 

4.1. Theorem. If f is continuous function on some interval a < x < b, and if f(x) ^ 0 for all x in 
this interval, then f(x) is either positive for all a < x < b or else it is negative for all a < x < b. 

PROOF. The theorem says that there can’t be two numbers a < X\ < X 2 <b such that f(x i) and f(x 2 ) 
have opposite signs. If there were two such numbers then the intermediate value theorem would imply that 
somewhere between X\ and x 2 there was a c with /(c) = 0. But we are assuming that /(c) yf 0 whenever 
a < c < b. □ 


214. Find equations for the tangent and normal lines 


to the curve ... 

at the point... 

(a) y = Ax/{l + x 2 ) 

(1.2) 

(b) y = 8/(4 +a: 2 ) 

(2,1) 

(c) y 2 — 2x + x 2 

(2,2) 

(d) xy = 3 

(1,3) 

Group Problem. 


The function 


/(*) = 

x 2 + |m| 

X 


4.2. Example. Consider 

f(x) = (x — 3)(x — l) 2 (2:r + l) 3 . 

The zeros of / (i.e. the solutions of f(x) = 0) are —1,3. These numbers split the real line into four intervals 

(- 00 ,-i), 1), (1,3), (3,oo). 


Theorem 4.1 tells us that f(x) cannot change its sign in any of these intervals. For instance, f{x) has the 
same sign for all x in the first interval ( 00 , — |). Now we choose a number we like from this interval (e.g. — 1) 
and find the sign of /(—1): /(—1) = (—■4)(—2) 2 (—3) 3 is positive. Therefore f{x) > 0 for all x in the interval 
(— 00 , — |). In the same we find 


/(-l) = (—4)(—2) 2 (—3) 3 > 0 
/( 0 ) = (-3)(—1) 2 (1) 3 <0 
/(2) = (-l)(l) 2 (5) 3 <0 
/(4) = (1)(3) 2 (9) 3 > 0 



f(x) > 0 for x < — \ 
f(x) < 0 for — \ < x < 1 
f(x) < 0 for 1 < x < 3 
f{x) > 0 for x > 3. 


If you know all the zeroes of a continuous function, then this method allows you to decide where the function 
is positive or negative. However, when the given function is factored into easy functions, as in this example, 
there is a different way of finding the signs of /. For each of the factors x — 3, (x — l) 2 and (2x + l) 3 it is easy 
to determine the sign, for any given x. These signs can only change at a zero of the factor. Thus we have 


• x — 3 is positive for x > 3 and negative for x < 3; 

• (x — l) 2 is always positive (except at x = 1); 

• (2x + l) 3 is positive for x > — \ and negative for x < — |. 

Multiplying these signs we get the same conclusions as above. We can summarize this computation in the 
following diagram: 
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5. Increasing and decreasing functions 

Here are four very similar definitions ■ look closely to see how they differ. 

• A function is called increasing if a < b implies /(a) < f(b) for all numbers a and b in the domain 

of /■ 

• A function is called decreasing if a < b implies f(a) > f(b) for all numbers a and b in the domain 

of /• 

• The function / is called non-decreasing if a < b implies /(a) < f(b) for all numbers a and b in 
the domain of /. 

• The function / is called non-increasing if a < b implies /(a) > f(b) for all numbers a and b in 
the domain of /. 

You can summarize these definitions as follows: 

/ is ... if for all a and b one has... 

Increasing: a < b => f(a) < f(b) 

Decreasing: a < b => /(a) > f(b) 

Non-increasing: a < b => /(a) > f(b) 

Non-decreasing: a < b ==> /(a) < f(b) 

The sign of the derivaitve of f tells you if / is increasing or not. More precisely: 


5.1. Theorem. If a function is non-decreasing on an interval a < x < b then f'(x) > 0 for all x in that 
interval. 

If a function is non-increasing on an interval a < x < b then f'(x) < 0 for all x in that interval. 

For instance, if / is non-decreasing, then for any given x and any positive Ax one has f(x + Ax) > f(x) 
and hence 

fix + Ax) - f(x) 


Now let Ax \ 0 and you find that 


/'(*) = lim /(* + M - m > 0. 

J v ' Ax\0 Ax 


What about the converse, i.e. if you know the sign of f then what can you say about /? For this we 
have the following 


5.2. Theorem. Suppose f is a differentiable function on an interval ( a,b). 

If f'(x) > 0 for all a < x < b, then f is increasing. 

If f'(x) < 0 for all a < x < b, then f is decreasing. 

The proof is based on the Mean Value theorem which also finds use in many other situations: 


255 







Figure 3. According to the Mean Value Theorem there always is some number c between a and b such 
that the tangent to the graph of / is parallel to the line segment connecting the two points 
and ( b , /(&)). This is true for any choice of a and b\ c depends on a and b of course. 


5.3. The Mean Value Theorem. If f is a differentiable function on the interval a < x < b, then 
there is some number c, with a < c < b such that 


/'(c) = 


/(&) - f(a) 
b — a 


Proof of THEOREM 5.2. We show that /'( x) > 0 for all x implies that / is increasing. Let X\ < x 2 be 
two numbers between a and b. Then the Mean Value Theorem implies that there is some c between x\ and 
X 2 such that 

f{x 2 ) - f(xf) 


/'(c) = 


X 2 ~ X\ 


or 


f(x 2 ) - f{x 1 ) = /'(c)(a ;2 - * 1 ). 

Since we know that /'(c) > 0 and x 2 — x\ > 0 it follows that f(x 2 ) — f(x 1 ) > 0, i.e. f(x 2 ) > f(x 1 ). □ 


6. Examples 

Armed with these theorems we can now split the graph of any function into increasing and decreasing 
parts simply by computing the derivative f'{x) and finding out where f'{x) > 0 and where f'(x) < 0 - i.e. 
we apply the method form the previous section to /' rather than /. 


6.1. Example: the parabola y = x 2 . The familiar graph of /( x) = x 2 consists of two parts, one 
decreasing and one increasing. You can see this from the derivative which is 


/'(x) = 2x 


> 0 for x > 0 
< 0 for x < 0. 


Therefore the function f(x) 


x 2 is decreasing for x < 0 and increasing for x > 0. 
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6.2. Example: the hyperbola y = 1/x. The derivative of the function f(x) = 1/x = x 1 is 

f'(x) = -4 

x z 

which is always negative. You would therefore think that this function is decreasing, or at least non-increasing: 
if a < b then 1/a > 1/b. But this isn’t true if you take a = —1 and 6=1: 

a = — 1 < 1 = 6, but - = — 1 < 1 = ^ !! 

a b 

The problem is that we used theorem 5.2, but it you carefully read that theorem then you see that it applies 
to functions that are defined on an interval. The function in this example, f(x) = 1/x, is not defined on 
the interval — 1 < x < 1 because it isn’t defined at x = 0. That’s why you can’t conclude that the f(x) = 1/x 
is increasing from x = —1 to x = +1. 

On the other hand, the function is defined and differentiable on the interval 0 < x < oo, so theorem 5.2 
tells us that f{x) = 1/x is decreasing for x > 0. This means, that as long as x is positive, increasing x will 
decrease 1/x. 


6.3. Graph of a cubic function. Consider the function 

y = f{x) = x 3 - x. 


Its derivative is 

f'(x) = 3x 2 - 1. 

We try to find out where f is positive, and where it is negative by factoring fix) 

f(x) = 3(x 2 - |) = 3(x + (x - 

from which you see that 

f{x) > 0 for x < — g-\/3 
f{x) < 0 for — < x < g\/3 

f{x) > 0 for x > |>/3 

Therefore the function / is 

increasing on (—oo, —decreasing on (-|\/3i gV^), increasing on (|-\/3, oo). 

At the two points x = ±g\/3 one has f(x) = 0 so there the tangent will be horizontal. This leads us to the 
following picture of the graph of /: 
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x = —x = |\/3 

Figure 4. The graph of f(x) = x 3 — x. 


6.4. A function whose tangent turns up and down infinitely often near the origin. We end 

with a weird example. Somewhere in the mathematician’s zoo of curious functions the following will be on 
exhibit. Consider the function 


n/ \ «*' , 2 • /l 

f{x) = 77+2-’ Sin-. 
2 x 



Figure 5. Positive derivative at a point (x = 0) does not mean that the function is "increasing near 
that point.” The slopes at the intersection points alternate between I and \ + n. 


For x = 0 this formula is undefined, and we are free to define /(0) = 0. This makes the function continuous 
at x = 0. In fact, this function is differentiable at x = 0, with derivative given by 


/'(0) = hm 

tc —>-0 


fix) - m 

x — 0 


hm —|- x sin — = -. 
a :-to 2 x 2 
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(To find the limit apply the sandwich theorem to — |ar| < a;sin - < |x|.) 

So the slope of the tangent to the graph at the origin is positive (|), and one would think that the 
function should be increasing near x = 0 (i.e. bigger x gives bigger f(x).) The point of this example is that 
this turns out not to be true. 


To explain why not, we must compute the derivative of this function for x ^ 0. It is given by 

„ . . 1 7T . 7T 

/ (x) = -7r cos —b 2x sin —. 

2 x x 

Now consider the sequence of intersection points Pi, P 2 , ... of the graph with the line y = x/2. They are 


Pk 5 Vk) •> 1 5 Uk ./*(*£&;)• 

K 


For larger and larger k the points Pk tend to the origin (the x coordinate is ? which goes to 0 as k —> 00 ). 
The slope of the tangent at Pk is given by 


f\x k ) 


1 7T 1 7T 

2 +2 P m yi 

1 , 2 . 

- — 7T CQSfc7T + — Sin K7T 

= (-l) fc =0 


- 7T « -2.64159265358979... 
\ + 7T w +3.64159265358979... 


for k even 
for k odd 


In other words, along the sequence of points Pk the slope of the tangent flip-flops between \ — 7r and \ + 7r, 
i.e. between a positive and a negative number. 

In particular, the slope of the tangent at the odd intersection points is negative, and so you would expect 
the function to be decreasing there. In other words we see that even though the derivative at x = 0 
of this function is positive, there are points on the graph arbitrarily close to the origin where 
the tangent has negative slope. 


7. Maxima and Minima 

A function has a global maximum at some a in its domain if f(x) < f(a) for all other x in the domain 
of /. Global maxima are sometimes also called “absolute maxima.” 

A function has a local maximum at some a in its domain if there is a small <5 > 0 such that f(x) < f(a) 
for all x with a — 5<x<a + 5 which lie in the domain of /. 

Every global maximum is a local maximum, but a local maximum doesn’t have to be a global maximum. 

7.1. Where to find local maxima and minima. Any x value for which }' (x) = 0 is called a 
stationary point for the function /. 


7.2. Theorem. Suppose f is a differentiable function on some interval [a, b]. 

Every local maximum or minimum of f is either one of the end points of the interval [a, b\, or else it is a 
stationary point for the function f. 


PROOF. Suppose that / has a local maximum at x and suppose that x is not a or b. By assumption the 
left and right hand limits 


/'(*) 


lim 

Ax /*0 


f(x + Ax) - f{x) 


Ax 


and f'{x) 


lim 

Aai\,0 


f{x + Ax) - f(x) 


Ax 


both exist and they are equal. 
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abs max 



Figure 6. A function defined on an interval [ a,b] with one interior absolute minimum, another interior 
local minimum, an interior local maximum, and two local maxima on the boundary, one of which is in 
fact an absolute maximum. 


Since / has a local maximum at x we have f(x + Ax) — f(x) < 0 if —8 < Ax < 8. In the first limit we 
also have Ax < 0, so that 

lim A* + Aa:) ~ /M < 0 


Ax /*0 


Ax 


Hence f'(x) < 0. 

In the second limit we have Ax > 0, so 

lim /(* + A *)-/M >q 
Ai\0 Ax 

which implies f'(x) > 0. 

Thus we have shown that f'(x) < 0 and f(x) > 0 at the same time. This can only be true if /'(x) = 0. □ 


7.3. How to tell if a stationary point is a maximum, a minimum, or neither. If /'(c) = 0 
then c is a stationary point (by definition), and it might be local maximum or a local minimum. You can tell 
what kind of stationary point c is by looking at the signs of f(x) for x near c. 


7.4. Theorem. If in some small interval (c — <5, c + 8) you have f(x) < 0 for x < c and f(x) > 0 for 
x > c then f has a local minimum at x = c. 

If in some small interval (c — 8,c + 5) you have f'(x) > 0 for x < c and f(x) < 0 for x > c then f has a 
local maximum at x = c. The reason is simple: if / increases to the left of c and decreases to the right of c 
then it has a maximum at c. More precisely: 

if f(x) > 0 for x between c — 8 and c, then / is increasing for c — 8 < x < c and therefore 
f(x) < f(c) for x between c — 8 and c. 

If in addition f'(x) < 0 for x > c then / is decreasing for x between c and c + 8, so 
that f(x) < /(c) for those x. 

Combine these two facts and you get f(x) < f(c) for c — 8 < x < c + 8. 
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7.5. Example — local maxima and minima of f(x) = x 3 — x. In §6.3 we had found that the function 
f(x) = x 3 — x is decreasing when —oo < x < — g-y/3, and also when |^3 < x < oo, while it is increasing when 
—1-\/3 < x < fy/l. It follows that the function has a local minimum at x = — 1\/3, and a local maximum at 

x = W 3 - 

Neither the local maximum nor the local minimum are global max or min since 

lim f{x) = +oo and lim f(x) = —oo. 

x — y —OO x —^OO 

7.6. A stationary point that is neither a maximum nor a minimum. If you look for stationary 
points of the function f{x) = x 3 you find that there’s only one, namely x = 0. The derivative f'(x) = 3x 2 
does not change sign at x = 0, so the test in Theorem 7.4 does not tell us anything. 

And in fact, x = 0 is neither a local maximum nor a local minimum since f(x) < f( 0) for x < 0 and 
f{x) > 0 for x > 0. 



8. Must there always be a maximum? 

Theorem 7.2 is very useful since it tells you how to find (local) maxima and minima. The following 
theorem is also useful, but in a different way. It doesn’t say how to find maxima or minima, but it tells you 
that they do exist, and hence that you are not wasting your time trying to find a maximum or minimum. 

8.1. Theorem. Let f be continuous function defined on the closed interval a < x < b. Then f attains 
its maximum and also its minimum somewhere in this interval. In other words there exist real numbers c and 
d such that 

/(c) < /( x) < f{d) 

whenever a < x < b. The proof of this theorem requires a more careful definition of the real numbers than 
we have given in Chapter 1, and we will take the theorem for granted. 

9. Examples — functions with and without maxima or minima 

In the following three example we explore what can happen if some of the hypotheses in Theorem 8.1 are 
not met. 




Figure 7. The function on the left has no maximum, and the one on the right has no minimum. 


261 












9.1. Question: Does the function 


f(x) 


have a maximum on the interval 0 < x < 1? 


x for 0 < x < 1 
0 for x = 1. 


Answer: No. What would the maximal value be? Since 

lim f(x) = lim x = 1 

X/*l X/^l 

The maximal value cannot be less than 1. On the other hand the function is never larger than 1. So if 
there were a number a in the interval [0,1] such that f[a) was the maximal value of /, then we would have 
f(a) = 1. If you now search the interval for numbers a with f(a) = 1, then you notice that such an a does 
not exist. Conclusion: this function does not attain its maximum on the interval [0,1]. 

What about Theorem 8.1? That theorem only applies to continuous functions, and the function / in this 
example is not continuous at x = 1. For at x = 1 one has 

/(l) = 0 ^ 1 = lim f(x). 

x/*l 

So all it takes for the Theorem to fail is that the function / be discontinuous at just one point. 

9.2. Question: Does the function 

f(x) = —z, 1 < x < oo 

x- 

have a maximum or minimum? 

Answer: The function has a maximum at x = 1, but it has no minimum. 

Concerning the maximum: if x > 1 then f(x) = 1/x 2 < 1, while /(1) = 1. Hence f{x) < /(1) for all x in 
the interval [1, oo) and that is why / attains its maximum at x = 1. 

If we look for a minimal value of / then we note that f{x) > 0 for all x in the interval [1, oo), and also 

that 

lim f(x ) = 0, 

x—>oo 

so that if f attains a minimum at some a with 1 < a < oo, then the minimal value /(a) must be zero. 
However, the equation /(a) = 0 has no solution - / does not attain its minimum. 

Why does Theorem 8.1 not apply? In this example the function / is continuous on the whole interval 
[1, oo), but this interval is not a closed interval, i.e. it is not of the form [a, b] (it does not include its endpoints). 


10. General method for sketching the graph of a function 

Given a differentiable function / defined on some interval a < x < 6, you can find the increasing and 
decreasing parts of the graph, as well as all the local maxima and minima by following this procedure: 

(1) find all solutions of f'(x) = 0 in the interval [a, b\: these are called the critical or stationary points 
for /. 

(2) find the sign of f'{x) at all other points 

(3) each stationary point at which f {x) actually changes sign is a local maximum or local minimum. 
Compute the function value f(x) at each stationary point. 

(4) compute the function values at the endpoints of the interval, i.e. compute /(a) and f{b). 

(5) the absolute maximum is attained at the stationary point or the boundary point with the highest 
function value; the absolute minimum occurs at the boundary or stationary point with the smallest 
function value. 

If the interval is unbounded, i.e. if the function is defined for —oo < x < oo then you can’t compute the 
values f(a) and /(&), but instead you should compute lima ,->±00 f{x). 
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10.1. Example — the graph of a rational function. Let’s “sketch the graph” of the function 

By looking at the signs of numerator and denominator we see that 


f(x ) > 0 for 0 < x < 1 
f{x) < 0 for x < 0 and also for x > 1. 


We compute the derivative of / 


Hence f'(x) = 0 holds if and only if 


, 1 — 2x — x 2 

J = -is-. 

(l + a ; 2 ) 2 


1 — 2x — x = 0 


and the solutions to this quadratic equation are — 1 ± \J2. These two roots will appear several times and it 
will shorten our formulas if we abbreviate 

A = -1 - y/2 and B = —1 + ^2. 


To see if the derivative changes sign we factor the numerator and denominator. The denominator is 
always positive, and the numerator is 

—x 2 — 2x + 1 = — ( x 2 + 2x — 1) = —(x — A)(x — B). 


Therefore 


/'(*) 


< 0 for x < A 

> 0 for A < x < B 

< 0 for x > B 


It follows that / is decreasing on the interval (—oo,H), increasing on the interval (A,B) and decreasing again 
on the interval (5, oo). Therefore 


A is a local minimum, and B is a local maximum. 


Are these global maxima and minima? 



Figure 8. The graph of f(x ) = (x — x 2 )/{l + x 2 ) 


Since we are dealing with an unbounded interval we must compute the limits of f(x) as x —> ±oo. You 

find 


lim f(x) = lim f(x) = —1. 

x —>-oo x—^ — oo 


Since / is decreasing between —oo and A , it follows that 

f(A) < f(x) < — 1 for — oo < x < A. 
Similarly, / is decreasing from B to +oo, so 

— 1 < f(x) < /(— 1 + y/2) for B < x < oo. 
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A 


convex 


A 


not convex 



Figure 9. If a graph is convex then all chords lie above the graph. If it is not convex then some chords 
will cross the graph or lie below it. 


Between the two stationary points the function is increasing, so 

/(-! - V 2 ) < fix) < f(B) for A < x < B. 

From this it follows that f(x) is the smallest it can be when x = A = —1 — y/2 and at its largest when 
x = B = — 1 + y/2: the local maximum and minimum which we found are in fact a global maximum and 
minimum. 


11. Convexity, Concavity and the Second Derivative 

By definition, a function / is convex on some interval a < x < b if the line segment connecting any pair 
of points on the graph lies above the piece of the graph between those two points. 

The function is called concave if the line segment connecting any pair of points on the graph lies below 
the piece of the graph between those two points. 

A point on the graph of / where /"( x) changes sign is called an inflection point. 

Instead of “convex” and “concave” one often says “curved upwards” or “curved downwards.” 

You can use the second derivative to tell if a function is concave or convex. 

11.1. Theorem. A function f is convex on some interval a < x < b if and only if f"(x) > 0 for all x 
on that interval. 

11.2. Theorem. A function f is convex on some interval a < x < b if and only if the derivative f'{x ) 
is a nondecreasing function on that interval. 

A proof using the Mean Value Theorem will be given in class. 



Figure 10. At an inflection point the tangent crosses the graph. 
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11.3. Example — the cubic function f(x) = x 3 — x. The second derivative of the function f(x) = 
x 3 — x is 

f"(x) = Qx 

which is positive for x > 0 and negative for x < 0. Hence, in the graph in §6.3, the origin is an inflection 
point, and the piece of the graph where x > 0 is convex, while the piece where x < 0 is concave. 


11.4. The second derivative test. In §7.3 we saw how you can tell if a stationary point is a local 
maximum or minimum by looking at the sign changes of f'(x). There is another way of distinguishing 
between local maxima and minima which involves computing the second derivative. 


11.5. Theorem. If c is a stationary point for a function f, and if f "(c) < 0 then f has a local maximum 
at x = c. 

If f"(c) > 0 then f has a local minimum at c. The theorem doesn’t say what happens when /"(c) = 0. 
In that case you must go back to checking the signs of the first derivative near the stationary point. 

The basic reason why this theorem is true is that if c is a stationary point with /"(c) > 0 then “ f'{x ) is 
increasing near x = c” and hence f'{x) < 0 for x < c and f'{x) > 0 for x > c. So the function / is decreasing 
for x < c and increasing for x > c, and therefore it reaches a local minimum at x = c. 


11.6. Example — that cubic function again. Consider the function f(x) = x 3 — x from §6.3 and 
§11.3. We had found that this function has two stationary points, namely at x = ig-y/3. By looking at the 
sign of f (x) = 3x 2 — 1 we concluded that —1-^/3 is a local maximum while +|-^/3 is a local minimum. Instead 
of looking at f(x) we could also have computed f"(x) at x = ±|-^/3 and applied the second derivative test. 
Here is how it goes: 

Since f"(x) = 6x we have 

f"(~W 3 ) = - V3 < 0 and f"(y 3) = 2^3 > 0. 

Therefore / has a local maximum at — |y/3 and a local minimum at \\/3. 

11.7. When the second derivative test doesn’t work. Usually the second derivative test will work, 
but sometimes a stationary point c has /"(c) = 0. In this case the second derivative test gives no information 
at all. The figure below shows you the graphs of three functions, all three of which have a stationary point at 
x = 0. In all three cases the second derivative vanishes at x = 0 so the second derivative test says nothing. 
As you can see, the stationary point can be a local maximum, a local minimum, or neither. 



Figure 11. Three functions for which the second derivative test doesn’t work. 
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12. Proofs of some of the theorems 


12.1. Proof of the Mean Value Theorem. Let to be the slope of the chord connecting the points 
(a,/(a)) and (b,f{b)), i.e. 

„ m - /(a) 

TO = ---, 

b — a 

and consider the function 

g(x) = f(x) - f(a) - m(x - a). 

This function is continuous (since / is continuous), and g attains its maximum and minimum at two numbers 
Cm i 11 and c max . 

There are now two possibilities: either at least one of c m i n or c max is an interior point, or else both c m i n 
and c max are endpoints of the interval a < x < b. 

Consider the first case: one of these two numbers is an interior point, i.e. if a < c m i n < b or a < c max < b, 
then the derivative of g must vanish at c m i n or c max . If one has g'(c m i n ) = 0, then one has 

0 — g (Cmin) — / (Cmin) to, i.e. TO — f (c m i n ). 


The definition of m implies that one gets 


/'(cmin) 


/(ft) - f(a) 

b — a 


If </(c m ax) = 0 then one gets to. 


/'(Cmax) and hence 


/'(Cmax) = 


/(ft) - f(a) 

b — a 


We are left with the remaining case, in which both c m i n and c max are end points. To deal with this case 
note that at the endpoints one has 

g(a) = 0 and g(b) = 0. 

Thus the maximal and minimal values of g are both zero! This means that g{x) = 0 for all x, and thus that 
g'(x) = 0 for all x. Therefore we get f'(x) = m for all x, and not just for some c. 


12.2. Proof of Theorem 5.1. If / is a non-increasing function and if it is differentiable at some interior 
point a, then we must show that /'(a) > 0. 

Since / is non-decreasing, one has f(x) > f(a) for all x > a. Hence one also has 

/pr) - /(a) > Q 
x — a ~ 


for all x > a. Let x \ a, and you get 


/'(«) 


lim /W ~ /(a) 

x\a X — CL 


12.3. Proof of Theorem 5.2. Suppose / is a differentiable function on an interval a < x < b, and 
suppose that f'(x ) > 0 on that interval. We must show that / is non-decreasing on that interval, i.e. we have 
to show that if X\ < X 2 are two numbers in the interval (a, 6), then f(x 1 ) > /(£ 2 ). To prove this we use the 
Mean Value Theorem: given X\ and X 2 the Mean Value Theorem hands us a number c with X\ < c < x 2 , and 

= 2 ) - /On) 

X-2 ~ Xi 

We don’t know where c is exactly, but it doesn’t matter because we do know that wherever c is we have 
f(c) > 0. Hence 

/Qc2) - f{x 1) > 0 

X 2 — X\ 

Multiply with X 2 — X\ (which we are allowed to do since X 2 > x\ so X 2 — 21 > 0) and you get 

f{x 2 ) - f(xi) > 0, 


266 










as claimed. 


13. Exercises 


218. What does the Intermediate Value Theorem say? 

219. What does the Mean Value Theorem say? 

220. Group Problem. 

If f(a) = 0 and f(b ) = 0 then there is a c between 
a and b such that /'(c) = 0. Show that this follows 
from the Mean Value Theorem. (Help! A proof! Relax: 
this one is not difficult. Make a drawing of the situation, 
then read the Mean Value Theorem again.) 

221. What is a stationary point? 

222. Group Problem. 

How can you tell if a local maximum is a global 
maximum? 

223. Group Problem. 

If /"(o) = 0 then the graph of / has an inflection 
point at x = a. True or False? 

224. What is an inflection point? 

225. Give an example of a function for which /'(0) = 0 
even though the function / has neither a local maximum 
or a local minimum at x = 0. 

226. Group Problem. 

Draw four graphs of functions, one for each of the 
following four combinations 

/' > 0 and /" > 0 /' > 0 and f" < 0 

/' < 0 and f" > 0 /' < 0 and f" < 0 

227. Group Problem. 

Which of the following combinations are possible: 

f'{x) > 0 and f"(x) = 0 for all x 
f'(x) — 0 and f"(x) > 0 for all x 

Sketch the graph of the following functions. You 
should 

(1) find where /, /' and f" are positive or negative 

(2) find all stationary points 

(3) decide which stationary points are local max¬ 
ima or minima 

(4) decide which local max/minima are in fact 
global max/minima 

(5) find all inflection points 

(6) find "horizontal asymptotes,” i.e. compute the 
limits lim^-i-ioo f(x) when appropriate. 


228. 

y = 

x 3 + 2x 1 2 

229. 

y = 

x 3 — 4a; 2 

230. 

y = 

x 4 + 27x 

231. 

y = 

x 4 - 27x 

232. 

y = 

x 4 + 2x 2 — 3 

233. 

y = 

x 4 — 5x 2 + 4 

234. 

y = 

x 5 6 + 16a; 

235. 

y = 

x 5 — 16a; 

236. 

y = 

X 

X + 1 

237. 

y = 

X 

1 + X 2 

238. 

y = 

X 2 

1 + X 2 

239. 

y = 

1+x 2 

1 + x 

240. 

y = 

1 

x + - 

X 

241. 

y = 

1 

x - 

X 

242. 

y = 

x 3 + 2a; 2 + x 

243. 

y = 

x 3 + 2x 2 — x 

244. 

y = 

4 3 

X — X — X 

245. 

y = 

x 4 — 2x 3 + 2x 

246. 

y = 

\jl + x 2 

247. 

y = 

\/l — X 2 

248. 

y = 

\J\ + X 2 

249. 

y = 

1 

1+x 4 


The following functions are periodic, i.e. they sat¬ 
isfy f(x + L) = f{x) for all x, where the constant L is 
called the period of the function. The graph of a periodic 
function repeats itself indefinitely to the left and to the 
right. It therefore has infinitely many (local) minima and 
maxima, and infinitely many inflections points. Sketch 
the graphs of the following functions as in the previous 
problem, but only list those “interesting points" that lie 
in the interval 0 < x < 2n. 

250. y = sin x 

251. y = sin x + cos x 

252. y = sin x + sin 2 x 

253. y = 2 sin x + sin 2 x 
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254. y = 4 sin x + sin 2 x 

255. y = 2 cos x + cos 2 x 

256. y = —A — 

2 + sm x 

257. y = (2 + sin®) 2 

Find the domain and sketch the graphs of each of 
the following functions 

258. y = arcsinx 

259. y = arctan® 

260. y = 2 arctan x — x 

261. y = arctan(® 2 ) 

262. y = 3 arcsin(®) — 5® 

263. y = 6 arcsin(®) — 10® 2 


264. In the following two problems it is not possible to solve 
the equation f'(x) = 0, but you can tell something from 
the second derivative. 

(a) Show that the function /(®) = ® arctan® is 
convex. Then sketch the graph of /. 

(b) Show that the function g(x) = ®arcsin® is 
convex. Then sketch the graph of g. 

For each of the following functions use the derivative 
to decide if they are increasing, decreasing or neither on 
the indicated intervals 


265. 

/(*) = 

X 

1 + ® 2 

10 < ® < oo 

266. 

/(*) = 

2 + ® 2 

1 

< X 

< oo 


x 3 — X 




267. 

/(*) = 

2 + x 2 

0 

< X 

< 1 


X 3 — X 




268. 

/(*) = 

2 + x 2 
x 3 — X 

0 

< X 

< oo 


14. Optimization Problems 

Often a problem can be phrased as 

For which value of x in the interval a < x < b is f(x) the largest? 

In other words you are given a function / on an interval [a, b] and you must find all global maxima of / on 
this interval. 

If the function is continuous then according to theorem 8.1 there always is at least one x in the interval 
[a, b] which maximizes f{x). 

If / is differentiable then we know what to do: any local maximum is either a stationary point or one of 
the end points a and b. Therefore you can find the global maxima by following this recipe: 

(1) Find all stationary points of /; 

(2) Compute f(x) at each stationary point you found in step (1); 

(3) Compute /(a) and /(&); 

(4) The global maxima are those stationary- or endpoints from steps (2) and (3) which have the largest 
function value. 

Usually there is only one global maximum, but sometimes there can be more. 

If you have to minimize rather than maximize a function, then you must look for global minima. The 
same recipe works (of course you should look for the smallest function value instead of the largest in step 4.) 

The difficulty in optimization problems frequently lies not with the calculus part, but rather with setting 
up the problem. Choosing which quantity to call x and finding the function / is half the job. 

14.1. Example — The rectangle with largest area and given perimeter. Which rectangle has 
the largest area, among all those rectangles for which the total length of the sides is 1? 

Solution: If the sides of the rectangle have lengths x and y, then the total length of the sides is 

L = x + x + y + y = 2(x + y) 

and the area of the rectangle is 

A = xy. 

So are asked to find the largest possible value of A = xy provided 2 (x + y) = 1. The lengths of the sides can 
also not be negative, so x and y must satisfy x > 0, y > 0. 
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We now want to turn this problem into a question of the form “maximize a function over some interval.” 
The quantity which we are asked to maximize is A , but it depends on two variables x and y instead of just 
one variable. However, the variables x and y are not independent since we are only allowed to consider 
rectangles with L = 1. From this equation we get 

L = 1 y = 2 — x. 

Hence we must find the maximum of the quantity 

A = xy = x (| — x) 

The values of x which we are allowed to consider are only limited by the requirements x > 0 and y > 0, i.e. 
x < |. So we end up with this problem: 

Find the maximum of the function f(x) = x(^ — x) on the interval 0 < x < |. 

Before we start computing anything we note that the function / is a polynomial so that it is differentiable, 
and hence continuous, and also that the interval 0 < x < \ is closed. Therefore the theory guarantees that 
there is a maximum and our recipe will show us where it is. 

The derivative is given by 

f'{x) = \- 2x, 

and hence the only stationary point is x = The function value at this point is 

f Ij = 1(1 _ Ij = JL 
J 4? 4V2 4/ 16 • 

At the endpoints one has x = 0 or x = |, which corresponds to a rectangle one of whose sides has length 
zero. The area of such rectangles is zero, and so this is not the maximal value we are looking for. 

We conclude that the largest area is attained by the rectangle whose sides have lengths 

x=\, and y=\~\ = \, 

i.e. by a square with sides |. 


15. Exercises 


269. By definition, the perimeter of a rectangle is the sum 
of the lengths of its four sides. Which rectangle, of all 
those whose perimeter is 1, has the smallest area? Which 
one has the largest area? 

270. Which rectangle of area lOOin 2 minimizes its height 
plus two times its length? 

271. You have 1 yard of string from which you make a 
circular wedge with radius R and opening angle 6. Which 
choice of 6 and R will give you the wedge with the largest 
area? Which choice leads to the smallest area? 

[A circular wedge is the figure consisting of two radii 
of a circle and the arc connecting them. So the yard of 
string is used to form the two radii and the arc.] 

272. Group Problem. 

(The lamp post problem) 

In a street two lamp posts are 300 feet apart. The 
light intensity at a distance d from the first lamp post is 
1000/d 2 , the light intensity at distance d from the second 
(weaker) lamp post is 125/d 2 (in both cases the light 
intensity is inversely proportional to the square of the 
distance to the light source). 


A 

X 

* 300ft * 

The combined light intensity is the sum of the two 
light intensities coming from both lamp posts. 

(a) If you are in between the lamp posts, at distance 
x feet from the stronger light, then give a formula for the 
combined light intensity coming from both lamp posts as 
a function of x. 

(b) What is the darkest spot between the two lights, 
i.e. where is the combined light intensity the smallest? 

273. (a) You have a sheet of metal with area 100 in 2 from 
which you are to make a cylindrical soup can. If r is the 
radius of the can and h its height, then which h and r 
will give you the can with the largest volume? 

(b) If instead of making a plain cylinder you replaced 
the flat top and bottom of the cylinder with two spherical 
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caps, then (using the same lOOin 2 os sheet metal), then 
which choice of radius and height of the cylinder give you 
the container with the largest volume? 

(c) Su ppose you only replace the top of the cylinder 
with a spherical cap, and leave the bottom flat, then 
which choice of height and radius of the cylinder result in 
the largest volume? 

274. A triangle has one vertex at the origin 0(0,0), another 
at the point A(2a,0) and the third at (a,a/( 1 + a 3 )). 
What are the largest and smallest areas this triangle can 
have if 0 < a < oo? 

275. Group Problem. 

Queen Dido’s problem 

According to tradition Dido was the founder and first 
Queen of Carthage. When she arrived on the north coast 
of Africa (~800BC) the locals allowed her to take as 
much land as could be enclosed with the hide of one ox. 
She cut the hide into thin strips and put these together 



(a) If Dido wanted a rectangular region, then how 
wide should she choose it to enclose as much area as 
possible (the coastal edge of the boundary doesn’t count, 
so in this problem the length AB + BC + CD is 100 
yards.) 

(b) If Dido chose a region in the shape of an isosce¬ 
les triangle PQR, then how wide should she make it to 
maximize its area (again, don’t include the coast in the 
perimiter: PQ + QR is 100 yards long, and PQ = QR.) 

276. The product of two numbers x,y is 16. We know 
x > 1 and y > 1. What is the greatest possible sum of 
the two numbers? 


277. What are the smallest and largest values that 
(sin x) (sin y) can have if x + y = n and if x and y 
are both nonnegative? 

278. What are the smallest and largest values that 

(cosa;)(cosy) can have if x + y = ^ and if x and y 

are both nonnegative? 

279. (a) What are the smallest and largest values that 
tana; + tany can have if x + y = 5 and if x and y are 
both nonnegative? 

(b) What are the smallest and largest values that 
tan x + 2 tan y can have if x + y = \ and if x and y are 
both nonnegative? 

280. The cost per hour of fuel to run a locomotive is u 2 /25 
dollars, where v is speed (in miles per hour), and other 
costs are $100 per hour regardless of speed. What is the 
speed that minimizes cost per mile ? 

281. Group Problem. 

Josh is in need of coffee. He has a circular filter 
with 3 inch radius. He cuts out a wedge and glues the 
two edges AC and BC together to make a conical filter 
to hold the ground coffee. The volume V of the coffee 
cone depends the angle 9 of the piece of filter paper Josh 
made. 



(a) Find the volume in terms of the angle 9. (Hint: 
how long is the circular arc AB on the left? How long 
is the circular top of the cone on the right? If you know 
that you can find the radius AD = BD of the top of the 
cone, and also the height CD of the cone.) 

(b) Which angle 6 maximizes the volume V? 


made that number up. For the rest start at http://en.wikipedia.org/wiki/Dido 
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Exponentials and Logarithms (naturally) 


In this chapter we first recall some facts about exponentials ( x v with x > 0 and y arbitrary): they should 
be familiar from algebra, or “precalculus.” What is new is perhaps the definition of x v when y is not a 
fraction: e.g., 2 3 / 4 is the 4th root of the third power of 2 (V2 3 ), but what is 2' /2 ? 

Then we ask “what is the derivative of f(x) = a x ?” The answer leads us to the famous number 
e ~ 2.718 281828 459 045 235 360 287471352 662 497 757 247 093 699 95 • • •. 

Finally, we compute the derivative of f(x) = log a x, and we look at things that “grow exponentially.” 


1. Exponents 


Here we go over the definition of x y when x and y are arbitrary real numbers, with x > 0. 
For any real number x and any positive integer n = 1,2,3,... one defines 

n times 

x n = X • X- - • • X 


and, if x ^ 0, 

x n 

One defines x° = 1 for any x ^ 0. 

To define x p ! q for a general fraction | one must assume that the number x is positive. One then defines 
(37) x p/q = 

This does not tell us how to define x a is the exponent a is not a fraction. One can define x a for irrational 
numbers a by taking limits. For example, to define 2'/ 2 , we look at the sequence of numbers you get by 
truncating the decimal expansion of i/2, i.e. 

oi = 1, a 2 = 1.4= a 3 = 1.41 = ygi, a 4 = 1.414 = jggg, •••• 

Each a n is a fraction, so that we know what 2 a " is, e.g. 2“ 4 = l0 \/2 1414 . Our definition of 2^ 2 then is 

2^ 2 = lim 2 a ", 


i.e. we define 2^ 2 as the limit of the sequence of numbers 

2, v 7 ^ 14 , 10 v^2 1414 , • 


(See table 1.) 

Here one ought to prove that this limit exists, and that its value does not depend on the particular choice 
of numbers a n tending to a. We will not go into these details in this course. 

It is shown in precalculus texts that the exponential functions satisfy the following properties: 


(38) 


b _ „,a+6 


= X' 


a—b 


(X°) b = : 


,ab 


provided a and b are fractions. One can show that these properties still hold if a and b are real numbers (not 
necessarily fractions.) Again, we won’t go through the proofs here. 
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X 

2 X 

1.0000000000 

1.4000000000 

1.4100000000 

1.4140000000 

1.4142000000 

1.4142100000 

1.4142130000 

1.4142135000 

2 .000000000000 
2.639015821546 
2.6 57371628193 
2.66 4749650184 
2.6651 19088532 
2.665137561794 
2.66514 3103798 
2.665144027466 


Table 1. Approximating 2 %/2 . Note that as x gets closer to \[2 the quantity 2 X appears to converge to 
some number. This limit is our definition of 2 v/2 . 


Now instead of considering x a as a function of x we can pick a positive number a and consider the 
function f(x) = a x . This function is defined for all real numbers x (as long as the base a is positive.). 


1.1. The trouble with powers of negative numbers. The cube root of a negative number is well 
defined. For instance 8 = —2 because (—2) 3 = —8. In view of the definition (37) of x p ^ q we can write 
this as 

(- 8) 1 / 3 = {/(- 8) 1 = ^8 = - 2 . 

But there is a problem: since | = g you would think that (—8) 2 / 6 = (—8) 1 / 3 . However our definition (37) 
tells us that 

(-8) 2/6 = yf (—8) 2 = yf+64 = +2. 


Another example: 
but, even though \ 


(_ 4 )!/ 2 = yZ 4 i s no t defined 
(— 4) 2 / 4 = yj (—4) 2 = v^+16 = 2 is defined. 


There are two ways out of this mess: 

(1) avoid taking fractional powers of negative numbers 

(2) when you compute x p ^ q first simplify the fraction by removing common divisors of p and q. 


The safest is just not to take fractional powers of negative numbers. 

Given that fractional powers of negative numbers cause all these headaches it is not surprising that we 
didn’t try to define x a for negative 2 ; if a is irrational. For example, (—8)’ 1 ’ is not defined 1 . 


2. Logarithms 

Briefly, y = log a a; is the inverse function to y = a x . This means that, by definition, 

V = logo x <==>• x = a v . 

In other words, log a x is the answer to the question “for which number y does one have x = a v T" The number 
log a x is called the logarithm with base a of x. In this definition both a and x must be positive. 

For instance, 

2 3 = 8 , 2 1/2 = V2, 2 _1 = * 

SO 

log 2 8 = 3, log 2 ( v / 2) = i, log 2 i = -1. 

1 There is a definition of (—8)^ which uses complex numbers. You will see this next semester if you take math 222. 
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Figure 1. 

labeled: ca 



n you figure out which is which? 


Also: 


log 2 (—3) doesn’t exist 

because there is no number y for which 2 V = —3 ( 2 V is always positive) and 


log_ 3 2 doesn’t exist either 

because y = log_ 3 2 would have to be some real number which satisfies (— 3) v = 2, and we don’t take 
non-integer powers of negative numbers. 


3. Properties of logarithms 


In general one has 

log a a x = x, and a loga x = x. 

There is a subtle difference between these formulas: the first one holds for all real numbers x, but the second 
only holds for x > 0, since log a x doesn’t make sense for x < 0. 

Again, one finds the following formulas in precalculus texts: 


(39) 


They follow from (38). 


log a xy = 

i x 

log “ V = 
l0g a X V = 

log a X = 


log a x + log a y 
log a x - log a y 

V ^ga X 

log bX 
log b a 


4. Graphs of exponential functions and logarithms 

Figure 1 shows the graphs of some exponential functions y = a x with different values of a, and figure 2 
shows the graphs of y = log 2 x, y = log 3 x, log 3 / 2 x , log]y 3 (x) and y = log 10 x. Can you tell which is which? 
(Yes, you can.) 

From algebra/precalc recall: 

If a > 1 then /(x) = a x is an increasing function, 
and 

If 0 < a < 1 then f(x) = a x is a decreasing function. 

In other words, for a > 1 it follows from X\ < X2 that a Xl < a X2 \ if 0 < a < 1, then xi < X 2 implies a Xl > a X2 . 
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Figure 2. Graphs of some logarithms. Each curve is the graph of a function y = log a x for various 
values of a > 0. Can you tell what a is for each graph? 


5. The derivative of a x and the definition of 


To begin, we try to differentiate the function y = 2 X : 

d2 x . 2 x+Ax - 2 : 
= lim 


dx Ax-rO Ax 

2*2 Ax _ 2 X 

= lim 


Ax->0 Ax 

oAx _ 1 

= lim 2 X =—- 

Ai ->0 Ax 


= 2 X lim 


2 ax _ i 


Ai->o Aa; 


So if we assume that the limit 

exists then we have 
(40) 


oAx _ i 

lim —-= C 

Ano Aa; 


d2 x 

dx 


= C2 X 


On your calculator you can compute 2 A ~ 1 for smaller and smaller values of Ax, which leads you to suspect 
that the limit actually exists, and that C ~ 0.693 147 .... One can in fact prove that the limit exists, but we 
will not do this here. 

Once we know (40) we can compute the derivative of a x for any other positive number a. To do this we 
write a = 2 log2 a , and hence 


_ (2 lo S2 a ^ x _ 2 x lo B2 a 
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By the chain rule we therefore get 


da x _ d2 xloS2 a 
dx dx 


= c 2 x-log 2 a dx ■ l0 §2 a 

dx 

= (C log 2 a) 2 x ' loS2 ° 

= (C log 2 a) a x . 

So the derivative of a x is just some constant times a x , the constant being Clog 2 a. This is essentially our 
formula for the derivative of a x , but one can make the formula look nicer by introducing a special number, 
namely, we define 

e = 2 1 / c where C = lim 

Ai->0 Ax 

One has 

e w 2.718 281 818 459 ••• 

This number is special because if you set a = e, then 

C log 2 a = C log 2 e = Clog 2 2 1/c = C ■ ^ = 1, 


2 Ax - 1 


C 


and therefore the derivative of the function y = e x is 
(41) 


de x 


dx 


Read that again: the function e x is its own derivative! 

The logarithm with base e is called the Natural Logarithm , and is written 

In a; = log e x. 

Thus we have 
(42) 


e lnx = x 


In e x = x 


where the second formula holds for all real numbers x but the first one only makes sense for x > 0. 
For any positive number a we have a = e na , and also 


a = e' 


:lnc 


By the chain rule you then get 
(43) 


da x 

dx 


= a x In a. 


6. Derivatives of Logarithms 

Since the natural logarithm is the inverse function of f(x) = e x we can find its derivative by implicit 
differentiation. Here is the computation (which you should do yourself) 

The function f(x) = log 0 a: satisfies 

a f(x) = x 

Differentiate both sides, and use the chain rule on the left, 

(In a)a^ x) f'{x) = 1. 

Then solve for f'{x) to get 

^ ^ ^ (lna)a-fd 

Finally we remember that a^ x ^ = x which gives us the derivative of a x 

da x _ 1 

dx x In a 
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In particular, the natural logarithm has a very simple derivative, namely, since lne = 1 we have 


(44) 


d In x 1 

dx x 


7. Limits involving exponentials and logarithms 

7.1. Theorem. Let r be any real number. Then, if a > 1, 

lim x r a~ x = 0, 

x —^OO 

i.e. 

lim — = 0. 

x—yoo a x 

This theorem says that any exponential will beat any power of x as x —» oo. For instance, as x —> oo both 
a: 1000 and (l.OOl)® go to infinity, but 

a; 1000 

™ ( 1 . 001 )* = °’ 

so, in the long run, for very large x, 1.001 x will be much larger than lOOCP. 

PROOF when a = e. We want to show linx r _ >00 x r e~ x = 0. To do this consider the function f[x) = 
x r+1 e~ x . Its derivative is 

f'( x) = —— -= Ur + l)x r — x r+1 )e~ x = (r + 1 — x)x r e~ x . 

dx 

Therefore f{x) < 0 for x > r + 1, i.e. /( x) is decreasing for x > r + 1. It follows that f(x) < f{r + 1) for all 
x > r + 1, i.e. 

x r+1 e~ x < (r + l) r+1 e-( r+1) for x > r + 1. 

Divide by x, abbreviate A = (r + l) r+l e~^ r+l \ and we get 

0 < x r e~ x < — for all x > r + 1. 
x 

The Sandwich Theorem implies that lim^-^oo x r e~ x = 0, which is what we had promised to show. 

□ 


Here are some related limits: 


a > 1 = 

,. a 

=> hm — = oo 


x— >-oo X r 


In x 

m > 0 = 

=> lim -= 0 


x—>oo X m 

m > 0 = 

lim x m In x = 0 




The second limit says that even though In a; becomes infinitely large as x —> oo, it is always much less than 
any power x m with m > 0 real. To prove it you set x = e l and then t = s/m., which leads to 

.. In a: *=«»* t t=s/m 1 .. s 

lim - = hm -r = — inn — =1). 


x—>oo x 


t —>oo g 71 


m *->• oo e s 


The third limit follows from the second by substituting x = 1/y and using In ^ = — Inx. 
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8. Exponential growth and decay 


A quantity X which depends on time t is said to grow or decay exponentially if it is given by 


(45) 


X(t) =X 0 e kt . 


The constant Xo is the value of X(t) at time t = 0 (sometimes called “the initial value of X”). 

The derivative of an exponentially growing quantity, i.e. its rate of change with time, is given by 
X'(t) = Xq ke kt so that 


(46) 


dX(t) 

dt 


kX(t). 


In words, for an exponentially growing quantity the rate of change is always proportional to the quantity itself. 
The proportionality constant is k and is sometimes called “the relative growth rate.” 

This property of exponential functions completely describes them, by which I mean that any function 
which satisfies (46) automatically satisfies (45). To see that this is true, suppose you have a function X(t) for 
which X'(t) = kX(t) holds at all times t. Then 


dX{t)e~ kt 

dt 


de kt dX(t) _ kt 

= x(i) ^r + ^ e 

= -kX(t)e~ kt + X\t)e~ kt 
= (X'(t) - kX{t))e~ kt 
= 0 . 


It follows that X(t)e kt does not depend on f. At t = 0 one has 

X(t)e~ kt = X(0)e° = X 0 

and therefore we have 

X{t)e~ kt = Xq for all t. 


Multiply with e kt and we end up with 


X(t) = X 0 e kt . 


8.1. Half time and doubling time. If X(t) = Xoe kt then one has 

X(t + T) = X 0 e kt+kT = X 0 e kt e kT = e kT X(t). 


In words, after time T goes by an exponentially growing (decaying) quantity changes by a factor e kT . If 
k > 0, so that the quantity is actually growing, then one calls 



the doubling time for X because X(t) changes by a factor e kT = e ln2 = 2 every T time units: X(t) doubles 
every T time units. 

If k < 0 then X (t) is decaying and one calls 


In 

~k 


the half life because X(t) is reduced by a factor e kT = e ln2 


\ every T time units. 
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8.2. Determining A'o and k. The general exponential growth/decay function (45) contains only two 
constants, Xq and k, and if you know the values of X(t) at two different times then you can compute these 
constants. 


Suppose that you know 
Then we have 


Xi =X{h) and X 2 =X(t 2 ). 


X 0 e ktl = Ad and A 2 = X 0 e kt2 

in which t\,t 2 ,X\,X 2 are given and k and A'o are unknown. One first finds k from 


Xi A'oe 


kt-\ 


X 2 A 0 e fct2 


_ e k(t i-t 2 ) 


In ~ = k(t - 1 - t 2 ) 
^2 


which implies 


k = 


In Xi — In X 2 


1 1 — t 2 


Once you have computed k you can find A’o from 

(both expressions should give the same result.) 


x - Xl 


A 2 

£>kto ’ 


9. Exercises 


Sketch the graphs of the following functions. 

(Hint for some of these: if you have to solve some¬ 
thing like e 4x — 3e 3x + e x = 0, then call w = e x , and you 
get a polynomial equation for w, namely w 4 — 3u> 3 + w = 
0 .) 

282. y = e x 

283. y = e~ x 

284. y = e x + e~ 2x 

285. y = e 3x - 4e x 


298. y = l n yH| (|x|<l) 

299. y = ln(l + * 2 ) 

300. y = ln(a; 2 — 3x + 2) (x > 2) 

301. i/ = lncosx (|x| < f) 

302. The function f{x) = e~ x plays a central in statistics 
and its graph is called the bell curve (because of its 
shape). Sketch the graph of /. 

303. Sketch the part of the graph of the function 


286. y 


l + e x 


f(x) = e * 


287. y 


2e x 

1 + e 2x 


with x > 0. 

Find the limits 


288. y = xe~ x 

289. y = \fxe ~ x1/4 

290. y = x 2 e x+2 

291. y = e x ^ 2 — x 

292. y = In y/x 

293. y = In ' 

x 

294. y = xlnx 

295. y = -—— (0 < x < oo,x 1) 

In x 

296. y = (lnx) 2 (x > 0) 

297. y = — (x > 0) 

X 


lim and lim f(x) 

x\0 X n x—too 

where n can be any positive integer (hint: substitute 
x = . . .?) 

304. A damped oscillation is a function of the form 

f(x) = e~ ax cosbx or f(x) = e~ ax sin bx 

where a and b are constants. 

Sketch the graph of f(x) = e~ x sin lOx (i.e. find ze¬ 
roes, local max and mins, inflection points) and draw (with 
pencil on paper) the piece of the graph with 0 < x < 27r. 

This function has many local maxima and minima. 
What is the ratio between the function values at two 
consecutive local maxima? (Hint: the answer does not 
depend on which pair of consecutive local maxima you 
consider.) 
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305. Find the inflection points on the graph of /(*) = 
(1 + x) In* (* > 0). 

306. (a) If x is large, which is bigger: 2 X or x 2 ? 

(b) The graphs of /(*) = x 2 and g(x) = 2 X intersect at 
x = 2 (since 2 2 = 2 2 ). How many more intersections do 
these graphs have (with — oo < x < oo)? 

Find the following limits. 


307. 

lim 

e x - 1 


x—>oo 

e x + 1 

308. 

lim 

e x - x 2 


X — 

e x + x 

309. 

lim 

2 X 


x—>oo 

3® _ 2 x 

310. 

lim 

e x — x 2 


x—too 

e 2x + e~ x 

311. 

lim 

„ — x „ — x/'. 


x—>oo 

y/e x + 1 

312. 

lim 

\J x + e^ x 


x—>oo 

e 2x + x 

313. 

lim 

e V x 


x—>oo 

\/e x + 1 

314. 

lim ln(l + *) — 

315. 

x—too 

lim 

In* 


x—too 

In* 2 

316. 

lim x In * 


317. lim ln ^' — 

X->00 y/X + In x 

318. lim * nX — 

*->o y/x + In a? 

319. Find the tenth derivative of xe x . 

320. For which real number x is 2 X — 3^ the largest? 


321. Find 


dx x dx x 
dx ' dx 


322. Group Problem. 

About logarithmic differentiation: 

(a) Let y = (x + 1) 2 (* + 3) 4 (* + 5) 6 and u = In y. Find 

du/dx. Hint: Use the fact that In converts multiplication 
to addition before you differentiate. It will simplify the 
calculation. 

(b) Check that the derivative of ln«(*) is the logarithmic 
derivative of the function u (as defined in the exercises 
following §25, chapter 4.) 


323. After 3 days a sample of radon-222 decayed to 58% 
of its original amount. 

(a) What is the half life of radon-222? 

(b) How long would it take the sample to decay to 10% 
of its original amount? 


324. Polonium-210 has a half life of 140 days. 

(a) If a sample has a mass of 200 mg find a formula for 
the mass that remains after t days. 

(b) Find the mass after 100 days. 

(c) When will the mass be reduced to 10 mg? 

(d) Sketch the graph of the mass as a function of time. 

325. Current agricultural experts believe that the world's 
farms can feed about 10 billion people. The 1950 world 
population was 2.517 billion and the 1992 world popula¬ 
tion was 5.4 billion. When can we expect to run out of 
food? 

326. Group Problem. 

The AC ME company runs two ads on Sunday morn¬ 
ings. One says that "when this baby is old enough to vote, 
the world will have one billion new mouths to feed” and 
the other says “in thirty six years, the world will have to 
set eight billion places at the table.” What does AC ME 
think the population of the world is at present? How 
fast does AC ME think the population is increasing? Use 
units of billions of people so you can write 8 instead of 
8,000,000,000. (Hint: 36 = 2x18.) 

327. The population of California grows exponentially at 
an instantaneous rate of 2% per year. The population of 
California on January 1, 2000 was 20,000,000. 

(a) Write a formula for the population N(t) of Cali¬ 
fornia t years after January 1, 2000. 

(b) Each Californian consumes pizzas at the rate of 
70 pizzas per year. At what rate is California consuming 
pizzas t years after 1990? 

(c) How many pizzas were consumed in California 
from January 1, 2005 to January 1, 2009? 

328. The population of the country of Farfaraway grows 
exponentially. 

(a) If its population in the year 1980 was 1,980,000 
and its population in the year 1990 was 1,990,000, what 
is its population in the year 2000? 

(b) How long will it take the population to double? 
(Your answer may be expressed in terms of exponentials 
and natural logarithms.) 

329. The hyperbolic functions are defined by 

_ Q~ X 

sinh x = -—-, 

e + e 

cosh x = ---, 

. sinh x 

tanh* = -^—. 

cosh* 

(a) Prove the following identities 

cosh 2 x — sinh 2 x = 1 
cosh 2x = cosh 2 x + sinli 2 x 
sinh 2x = 2 sinh * cosh *. 
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(b) Show that 


(c) Sketch the graphs of the three hyperbolic functions. 


d smh x 

-;- = cosh x, 

ax 

d cosh a; . , 

--- = smh x, 

dx 

d tanh x 1 

dx cosh 2 x 
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The Integral 


In this chapter we define the integral of a function on some interval [a, b}. The most common interpretation 
of the integral is in terms of the area under the graph of the given function, so that is where we begin. 


1. Area under a Graph 



Let / be a function which is defined on some interval a < x < b and assume it is positive, i.e. assume 
that its graph lies above the x axis. How large is the area of the region caught between the x axis, the graph 
of y = f(x ) and the vertical lines y = a and y = b? 

You can try to compute this area by approximating the region with many thin rectangles. Look at figure 
1 before you read on. To make the approximating region you choose a partition of the interval [a, 6], i.e. you 
pick numbers X\ < • • • < x n with 

a = xo < Xi < X 2 < ■ ■ ■ < x n -i < x n = b. 

These numbers split the interval [a, b] into n sub-intervals 

[x 0 ,xi], [£ 1 , 2 : 2 ], ..., [x n -i, x n \ 

whose lengths are 

Ax 1 =x 1 —x 0 , Ax 2 = x 2 — xi, ..., Ax n = x n — x n _i. 

In each interval we choose a point Ck , i.e. in the first interval we choose xo < Ci < x\, in the second interval 
we choose 21 < C 2 < x 2 , ... , and in the last interval we choose some number x n -± < c n < x n . See figure 1. 

We then define n rectangles: the base of the k th rectangle is the interval [xk-i,Xk\ on the 2 ;-axis, while 
its height is f{ck) (here k can be any integer from 1 to n.) 

The area of the k th rectangle is of course the product of its height and width, i.e. its area is f(ck)Axk- 
Adding these we see that the total area of the rectangles is 

(47) R = f(ci)Axi + f(c 2 )Ax 2 H-f /(c„) Ax n . 

This kind of sum is called a Riemann sum. 

If the partition is sufficiently fine then one would expect this sum, i.e. the total area of all rectangles 
to be a good approximation of the area of the region under the graph. Replacing the partition by a finer 
partition, with more division points, should improve the approximation. So you would expect the area to be 
the limit of Riemann-sums like R “as the partition becomes finer and finer.” A precise formulation of the 
definition goes like this: 
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Figure 1. TOP. A Riemann sum in which the interval a < x < b has been cut up into six smaller 
intervals. In each of those intervals a point a has been chosen at random, and the resulting rectangles 
with heights /(ci), .. . , f(ce) were drawn. The total area under the graph of the function is roughly 
equal to the total area of the rectangles. BOTTOM. Refining the partition. After 

adding more partition points the combined area of the rectangles will be a better approximation of the 
area under the graph of the function /. 


1.1. Definition. If f is a function defined on an interval [a, b], then we say that 

[ f{x)dx = I, 

J a 

i.e. the integral of “f(x) from x = a to b” equals I, if for every e > 0 one can find a S > 0 such that 

f(ci)Axi + /(c 2 ) Ax 2 H-f /(c„) Ax n - I 

holds for every partition all of whose intervals have length Axk < 5. 


< £ 


2. When / changes its sign 

If the function / is not necessarily positive everywhere in the interval a < x < b, then we still define the 
integral in exactly the same way: as a limit of Riemann sums whose mesh size becomes smaller and smaller. 
However the interpretation of the integral as “the area of the region between the graph and the z-axis” has a 
twist to it. 

Let / be some function on an interval a < x < b, and form the Riemann sum 

R = f(ci)Axi + /(c 2 ) Ax 2 H-h f{c n )Ax n 
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Figure 2. Illustrating a Riemann sum for a function whose sign changes. Always remember this: AREAS 
ARE ALWAYS POSITIVE NUMBERS. The Riemann-sum corresponding to this picture is the total 
area of the rectangles above the x-axis minus the total area of the rectangles below the x-axis. 


that goes with some partition, and some choice of Ck- 

When / can be positive or negative, then the terms in the Riemann sum can also be positive or negative. 
If /(cfc) > 0 then the quantity /(cj,) Ax*, is the area of the corresponding rectangle, but if /(cfc) < 0 then 
/(cfc) Axk is a negative number, namely minus the area of the corresponding rectangle. The Riemann sum is 
therefore the area of the rectangles above the x-axis minus the area below the axis and above the graph. 

Taking the limit over finer and finer partitions, we conclude that 

area above the x-axis, below the graph 
minus the area below the x-axis, above the graph. 


f{x)da 


3. The Fundamental Theorem of Calculus 

3.1. Definition. A function F is called an antiderivative of f on the interval [a, b\ if one has F'(x) = 
/(x) for all x with a < x < b. 

For instance, F(x) = |x 2 is an antiderivative of f(x) = x, but so is G(x) = \x 2 + 2008. 

3.2. Theorem. If f is a function whose integral f b f{x)dx exists, and if F is an antiderivative of f on 
the interval [a, 6], then one has 

(48) [ b f(x)dx = F(b) - F(a). 

J a 

(a proof was given in lecture.) 

Because of this theorem the expression on the right appears so often that various abbreviations have 
been invented. We will abbreviate 

F{b)-F{a)^[F{x)] b x=a =[F{x)} b a . 
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3.3. Terminology. In the integral 


f(x) dx 


the numbers a and b are called the bounds of the integral, the function f(x) which is being integrated is called 
the integrand, and the variable x is integration variable. 

The integration variable is a dummy variable. If you systematically replace it with another variable, the 
resulting integral will still be the same. For instance, 


and if you replace x by ip you still get 


/ x 2 dx=[lx 3 ] 1 =i, 

Jo 

j <P 2 dtp = [itp 3 ] 1 ^ 0 = |. 


Another way to appreciate that the integration variable is a dummy variable is to look at the Fundamental 
Theorem again: 


f f(x) dx = F(b) — F(a). 
J a 


The right hand side tells you that the value of the integral depends on a and b, and has absolutely nothing to 
do with the variable x. 


4. Exercises 


330. What is a Riemann sum of a function y = fix)? 

331. Let / be the function f(x) = 1 — x 2 . 

Draw the graph of f[x) with 0 < x < 2. 

Compute the Riemann-sum for the partition 

0<|<1<§<2 

of the interval [a, b] = [0,2] if you choose each Ck to be 
the left endpoint of the interval it belongs to. Draw the 
corresponding rectangles (add them to your drawing of 
the graph of /). 

Then compute the Riemann-sum you get if you 
choose the Ck to be the right endpoint of the interval it 
belongs to. Make a new drawing of the graph of / and 
include the rectangles corresponding to the right endpoint 
Riemann-sum. 

332. Group Problem. 

Look at figure 1 (top). Which choice of intermediate 
points ci, .... C6 leads to the smallest Riemann sum? 
Which choice would give you the largest Riemann-sum? 

(Note: in this problem you’re not allowed to change 
the division points Xi, only the points a in between them.) 

Find an antiderivative F(x) for each of the following 
functions f(x). Finding antiderivatives involves a fair 
amount of guess work, but with experience it gets easier 


336. f(x) = x 4 — x 2 

. x 2 x 3 x 4 

337. f(x) = l + x+ — + — + — 

338. f{x) = * 

339. f(x) — e x 

340. f(x)=l 

341. f(x) — e 2x 


342. f(x) = 

343. /(*) = 

344. /(*) = 

345. f(x) = 

346. f(x) = 


2 + x 


1 + x 2 

e x +e~ 
2 

I 

Vl — X 


347. f(x)= sins 

348. f(x) = ^ 




to guess antiderivatives. 

349. 

fix) 

= cosx 

333. 

fix) 

= 2x+l 

350. 

fix) 

= cos 2x 

334. 

fix) 

= l-3x 

351. 

fix) 

= sin(a; — 7t/3) 

335. 

fix) 

= x 2 — X + 11 

352. 

fix) 

= sin x + sin 2x 
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353. f{x) = 2x(l + x 2 ) 5 

In each of the following exercises you should compute 
the area of the indicated region, and also of the smallest 
enclosing rectangle with horizontal and vertical sides. 

Before computing anything draw the region. 

354. The region between the vertical lines x = 0 and x = 1, 
and between the rr-axis and the graph of y = x 3 . 


362. The region between the graph of y = 1/x and the 
a>axis, and between x = a and x = b (here 0 < a < b 
are constants, e.g. choose a = 1 and b = \pl if you have 
something against either letter a or 6.) 


363. The region above the aj-axis and below the graph of 

f[x) = ihc + 1" L 


355. The region between the vertical lines x = 0 and x = 1, 
and between the z-axis and the graph of y = x n (here 
n > 0, draw for n = |, 1, 2, 3, 4). 

356. The region above the graph of y = \fx, below the line 
y = 2, and between the vertical lines x = 0, x = 4. 

357. The region above the :r-axis and below the graph of 

f{x) = x 2 - x 3 . 

358. The region above the :r-axis and below the graph of 

f{x) = Ax 2 — x 4 . 

359. The region above the ir-axis and below the graph of 

f{x) = A — x 4 . 

360. The region above the *-axis, below the graph of 
f(x) = sin x, and between x = 0 and x = n. 

361. The region above the ir-axis, below the graph of 
f(x) = 1/(1 + x 2 ) (a curve known as Maria Agnesi's 
witch), and between x = 0 and a: = 1. 


364. Compute 


\J 1 — x 2 dx 


Jo 

without finding an antiderivative for \/l — x 2 (you can 
find such an antiderivative, but it's not easy. This integral 
is the area of some region: which region is it, and what 
is that area?) 


365. Group Problem. 


Compute these integrals without finding antideriva¬ 
tives. 



5. The indefinite integral 


The fundamental theorem tells us that in order to compute the integral of some function / over an 
interval [a, b\ you should first find an antiderivative F of /. In practice, much of the effort required to find an 
integral goes into finding the antiderivative. In order to simplify the computation of the integral 

(49) f f{x)dx = F(b) — F{a) 

J a 

the following notation is commonly used for the antiderivative: 


(50) 


F{x) 


f(x)dx. 


For instance, 

J x 2 dx = ^ir 3 , 

The integral which appears here does not have the integration bounds a and b. It is called an indefinite 
integral, as opposed to the integral in (49) which is called a definite integral. You use the indefinite 
integral if you expect the computation of the antiderivative to be a lengthy affair, and you do not want to 
write the integration bounds a and b all the time. 

It is important to distinguish between the two kinds of integrals. Here is a list of differences: 


J sin 5a; dx = — | cos5x, etc... 
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Indefinite integral 

Definite integral 

f f(x)dx is a function of x. 

f(x)dx is a number. 

By definition f f{x)dx is any function 
of x whose derivative is f(x). 

f(x)dx was defined in terms of Rie- 
mann sums and can be interpreted as 
‘‘area under the graph of y = fix)" , at 
least when fix) > 0. 

x is not a dummy variable, for example, 
J 2xdx = x 2 + C and J 2tdt = t 2 + C 
are functions of different variables, so 
they are not equal. 

2 is a dummy variable, for example, 
fg 2xdx = 1 , and f^2tdt = 1, so 
fg 2 xdx = 2 tdt. 


5.1. You can always check the answer. Suppose you want to find an antiderivative of a given 
function fix) and after a long and messy computation which you don’t really trust you get an “answer”, 
F(x). You can then throw away the dubious computation and differentiate the F(x) you had found. If 
F'(x) turns out to be equal to fix), then your F(x) is indeed an antiderivative and your computation isn’t 
important anymore. 

For example, suppose that we want to find f In x dx. My cousin Louie says it might be F{x) = x In x — x. 
Let’s see if he’s right: 

— (xlnx — x) = x ■ —f- 1 • In x — 1 = In x. 
dx x 

Who knows how Louie thought of this 1 , but it doesn’t matter: he’s right! We now know that flnxdx = 
x In x — x + C. 


5.2. About “+C”. Let f{x) be a function defined on some interval a < x < b. If F{x) is an 
antiderivative of f(x) on this interval, then for any constant C the function F{x) = F{x) + C will also be an 
antiderivative of f(x). So one given function f(x) has many different antiderivatives, obtained by adding 
different constants to one given antiderivative. 


5.3. Theorem. If F\{x) and Fz{x) are antiderivatives of the same function f{x) on some interval 
a < x < b, then there is a constant C such that Fi{x) = F 2 ix) + C. 


PROOF. Consider the difference G[x) = Fi(x) — F 2 (x). Then G'{x ) = F[{x) — F 2 (x) = f(x) — f{x) = 0, 
so that G[x) must be constant. Hence Fi{x) — F 2 (x) = C for some constant. □ 


It follows that there is some ambiguity in the notation f f{x) dx. Two functions F\ {xf) and F 2 {x) can 
both equal f /( x) dx without equaling each other. When this happens, they (Fi and F 2 ) differ by a constant. 
This can sometimes lead to confusing situations, e.g. you can check that 


I 2 sin x cos xdx = sin 2 x 
2 sin x cos x dx = — cos 2 a 


are both correct. (Just differentiate the two functions sin 2 a: and — cos 2 a;!) These two answers look different 
until you realize that because of the trig identity sin 2 x + cos 2 x = 1 they really only differ by a constant: 
sin 2 x = — cos 2 x + 1. 

To avoid this kind of confusion we will from 
now on never forget to include the “arbi¬ 
trary constant +C” in our answer when we 
compute an antiderivative. 
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J f(x) dx = F{x) + C 


r n +1 


n + 1 


C 


x n dx = 

f — dx = In Id + C 

x 

C e x dx = e x + C 
a ■" 


a x dx = 


+ C 


J In a 

^ sin x dx = — cos x + C 

cosxdx = sinx + C 

tan xdx = - In | cosa;| + C 

dx = arctan x + C 

dx = arcsin x + C 


1 + x 2 

1 


for all n ^ — 1 

(Note the absolute values) 

(don’t memorize: use a x = e xlna ) 


(Note the absolute values) 




The following integral is also useful, but not as important as the ones above: 

/ dx 1,1 + sin x „ „ 7r n 

-= o ln i-^- +G for 

cos x 2 1 — sin x 2 2 

Table 1. The list of the standard integrals everyone should know 


Table 1 lists a number of antiderivatives which you should know. All of these integrals should be familiar 
from the differentiation rules we have learned so far, except for for the integrals of tan a: and of -A—. You 
can check those by differentiation (using ln | = In a — In 6 simplifies things a bit). 


6. Properties of the Integral 

Just as we had a list of properties for the limits and derivatives of sums and products of functions, the 
integral has similar properties. 

Suppose we have two functions f(x) and g(x) with antiderivatives F(x) and G(x), respectively. Then we 
know that 

~^{ F (x) + G(x)} = F'(x) + G\x) = f{x) +g{x), 
in words, F + G is an antiderivative of / + g, which we can write as 

(51) J {/(x) + g{x)} dx = J f (x) dx + J g{x)dx. 

Similarly, cF(x )) = cF'(x) = cf{x) implies that 

(52) J cf(x) dx = c J f(x) dx 
if c is a constant. 

^He took math 222 and learned to integrate by parts. 
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These properties imply analogous properties for the definite integral. For any pair of functions on an 
interval [a, b] one has 


(53) 


and for any function / and constant c one has 
(54) 


rb rb nb 

' [f(x)+g(x)]dx= / f{x)dx + / g(x)dx, 

a J a J a 

ue has 

pb nb 

/ cf(x)dx = c f{x)dx. 

J a J a 


Definite integrals have one other property for which there is no analog in indefinite integrals: if you split 
the interval of integration into two parts, then the integral over the whole is the sum of the integrals over the 
parts. The following theorem says it more precisely. 


6.1. Theorem. Given a < c < b, and a function on the interval [a, b] then 


(55) 


) /»C nb 

f{x)dx = / f(x)dx + / f(x)dx. 
J a J c 


PROOF. Let F be an antiderivative of /. Then 

: nb 

f(x)dx = F(c) — F(a) and / f(x)dx = F(b) — F{a ), 

J C 


so that 


[ f{x)dx = F(b) — F(a) 

J a 

= F(b) - F(c) + F(c) - F(a) 

= j f{x)dx+ f f(x)dx. 

J a J c 


□ 


So far we have always assumed theat a < b in all indefinite integrals J a .... The fundamental theorem 
suggests that when b < a, we should define the integral as 

(56) 


[ f(x)dx = F(b) - F(a) = ~(F(a) - F(b)) = - [" f(x)dx. 
Ja Jb 


For instance, 


/ xdx = xdx = — |. 
' i Jo 


7. The definite integral as a function of its integration bounds 

Consider the expression 


I = 


t 2 dt. 


What does I depend on? To see this, you calculate the integral and you find 

^=[^ 3 ]o = ^ 3 -50 3 = ^ 3 - 

So the integral depends on x. It does not depend on t, since t is a “dummy variable” (see §3.3 where we 
already discussed this point.) 

In this way you can use integrals to define new functions. For instance, we could define 


I(x) = f t 2 dt , 
Jo 
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which would be a roundabout way of defining the function I(x) = ar/3. Again, since t is a dummy variable 
we can replace it by any other variable we like. Thus 



a 2 da 


defines the same function (namely, I(x) = |a; 3 ). 

The previous example does not define a new function ( I(x ) = x 3 /3). An example of a new function 
defined by an integral is the “error-function” from statistics. It is given by 

(57) erf(x) = f -^= [ e~ f dt , 

V 77 Jo 

so erf(cc) is the area of the shaded region in figure 3. The integral in (57) cannot be computed in terms of the 



standard functions (square and higher roots, sine, cosine, exponential and logarithms). Since the integral in 
(57) occurs very often in statistics (in relation with the so-called normal distribution) it has been given a 
name, namely, “erf(a;)”. 


How do you differentiate a function that is defined by an integral? The answer is simple, for if f{x) = F'(x ) 
then the fundamental theorem says that 


/ f(t)dt = F(x)-F(a), 

J a 


and therefore 

^ J fit) dt = ^{ F ( x ) “ F ( a )} = F 'i x ) = /(A)> 

i.e. 

il m dt = /(I) ' 

A similar calculation gives you 

fj x 

So what is the derivative of the error function? We have 


erf'(x) = 



r 2 r 

dx 1 

^Jo 



A A /V 2 

Vn dx Jo 


dt 
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8. Method of substitution 


The chain rule says that 


dF(G(x)) 

dx 


= F'(G(x)) ■ G'(x), 


so that 


J F'(G(x)) ■ G'{x) dx = F(G(x)) + C. 

8.1. Example. Consider the function f(x) = 2xsm(x 2 + 3). It does not appear in the list of standard 
iderivatives we knc 
F(u) = — cos u, then 

and 


antiderivatives we know by heart. But we do notice 2 that 2x = x 2 + 3). So let’s call G{x) = x 2 + 3, and 


F(G(x)) = — cos(;r 2 + 3) 
dF(G(x)) 


dx 


= sin(ar + 3) • 2x = f(x), 

F'(G(x)) G '( x ) 


so that 
(58) 


J 2x sin(a; 2 + 3) dx = — cos(x 2 + 3) + C. 


8.2. Leibniz’ notation for substitution. The most transparent way of computing an integral by 
substitution is by following Leibniz and introduce new variables. Thus to do the integral 

J f(G(x))G'(x) dx 

where f(u) = F'(u ), we introduce the substitution u = G(x), and agree to write 

du = dG(x) = G'(x) dx. 

Then we get 

J f{G(x))G\x) dx = J /(it) du = F(u) + C. 

At the end of the integration we must remember that u really stands for G{x), so that 

J f(G(x))G'(x) dx = F(u) + C = F{G(x)) + C. 

As an example, let’s do the integral (58) using Leibniz’ notation. We want to find 

J 2x sin(a: 2 + 3) dx 

and decide to substitute z = x 2 + 3 (the substitution variable doesn’t always have to be called it). Then we 
compute 

dz — d[x 2 + 3) = 2x dx and sin(ar + 3) = sinz, 

so that 

J 2x sin(i 2 + 3) dx = J sin z dz = — cos z + C. 

Finally we get rid of the substitution variable z, and we find 

J 2x sin(s 2 + 3) dx = — cos(x 2 + 3) + C. 

When we do integrals in this calculus class, we always get rid of the substitution variable because it is a 
variable we invented, and which does not appear in the original problem. But if you are doing an integral which 
appears in some longer discussion of a real-life (or real-lab) situation, then it may be that the substitution 
variable actually has a meaning (e.g. “the effective stoichiometric modality of CQF self-inhibition”) in which 


You will start noticing things like this after doing several examples. 
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case you may want to skip the last step and leave the integral in terms of the (meaningful) substitution 
variable. 


8.3. Substitution for definite integrals. For definite integrals the chain rule 


-(F(G(x))) = F\G{x))G\x) = f(G(x))G'(x) 


implies 

which you can also write as 
(59) 


f(G(x))G'(x) dx = F(G(b)) - F(G(a)). 


rG(b) 

f{G{x))G\x)dx = / f(u)du. 
; Ju=G(a ) 


8.4. Example of substitution in a definite integral. Let’s compute 

L 

using the substitution u = G(x) = 1 + x 2 . Since du = 2 xdx, the associated indefinite integral is 

f i r i 

xdx = b / — du. 


1 + x 2 


^du 


To find the definite integral you must compute the new integration bounds G(0) and G(l) (see equation (59).) 
If x runs between x = 0 and x = 1, then u = G(x) = 1 + x 2 runs between u = 1 + 0 2 = 1 and u = 1 + l 2 = 2, 
so the definite integral we must compute is 


(60) 


Tb —^ dx = I 

1 + x 2 1 


1 


du, 


which is in our list of memorable integrals. So we find 


lj t \ du = M ln “]l = l ln2 - 


Sometimes the integrals in (60) are written as 

to emphasize (and remind yourself) to which variable the bounds in the integral refer. 


9. Exercises 


Compute these derivatives: 


366. 

d 

dx 

f (l + t 2 ) 4 df 
Jo 

367. 

d 

dx 

/ Inc dz 

J X 

368. 

d 

dt 

G dx 

Jo 1 + z 2 

369. 

d 

dt 

r lft dx 

Jo 1+x 2 

370. 

d 

dx 

/ s 2 ds 

J X 


371. — / --—- [Which values of q are allowed here?] 

dq J ~q f — x 

372. [ e 2x dx 

dt J 0 

373. Group Problem. 

You can see the graph of the error function at 
http://en.Wikipedia.org/wiki/Error_function 

(a) Compute the second derivative of the error func¬ 
tion. How many inflection points does the graph of the 
error function have? 

(b) The graph of the error function on Wikipedia 
shows that erf(a;) is negative when x < 0. But the error 
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function is defined as an integral of a positive function 
so it should be positive. Is Wikipedia wrong? Explain. 

Compute the following indefinite integrals: 

374. j {6a; 5 — 2* -4 — 7x} dx 

375. j {+3/a; — 5 + 4e x + 7 X } dx 


376. 


J ( x/a + a/x + x a + a x + ax) dx 


377. J {y/x — v x 4 H 

378. J {2*+ (§)*} dx 


7 


— 6e x + 1} dx 


379. 


J (3a; — 5) dx 


I -2 
r 4 


380. 

J x 2 dx 

(hm. 

381. 

j\- 2 dt 

(0 

382. 

j\~ 2 dt 

(!!!) 

383. 

(\l~2x 

— 3a; 2 ) dx 


384. 

385. 


r 2 

J (5a; 2 — 4a; + 3) dx 
J (5 y 4 - 6 y 2 + 14) dy 


I - 3 
r-i 


386. I (y 9 - 2y 5 + 3y) dy 
Jo 


387. / \fxdx 
Jo 

388. [ x 3/7 dx 

389. 1,1 1 

390. 

391. 


,i v t 2 t 4 

2 f 6 _ f 2 

dt 


dt 


/1 t 4 

r 2 x 2 + i 

h \/x 

r2 


dx 


392. 


f (x 3 — l) 2 dx 

J o 

393. f u{\/u+\/u)du 
Jo 


394. j (x+l/x) 2 dx 


395. 

396. 

397. 

398. 

399. 

400. 

401. 


r3 

J y/x 5 +2 dx 


(x — l)(3a; + 2) dx 


J (Vt — 2/y/t) dt 

L (^ + w) dr 


(x + 1) dx 
^ ” 1 dx 


L 


i 

-2 „4 


dx 



405. 

406. 

407. 

408. 

409. 

410. 

411. 

412. 

413. 


H r/3 

7r/4 

f*7r/2 

( 

k/2 


1 0 


sin t dt 

(cos 9 + 2 sin 9) d9 

2 

(cos 9 + sin 29) d9 
tana; 


dx 


I 2tt/3 

/■tt/2 


cot a; 


In /3 sin a; 

r+3 6 

/l 1 + X 2 

r°- 5 dx 

1 0 V1- X 

f (1 /x)dx 


dx 


dx 


pin 6 

J In 3 


8e x dx 


414. / 2* dt 


415. 

416. 


-e o 

— dx 
2 x 


L 

r 3 

J \x 2 — 1| dx 
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■2 


417. 

J \x — x 2 \ dx 

418. 

J (x — 2\x\) dx 

419. 

[ (® 2 — |® — 1|) dx 
Jo 

420. 

/ /(®) dx where 

Jo 

421. 

/*7T 

/ f(x) dx where 

J —7T 


/(*) = 


x 4 if 0 < * < 1, 
x 5 , if 1 < x < 2. 


/(*) = 


sin a;, if 0 < x < tv. 


422. Compute 


r2 

I = / 2®(l + ® 2 ) 3 dx 
JO 


in two different ways: 

(a) Expand (1 + ® 2 ) 3 , multiply with 2®, and integrate 
each term. 

(b) Use the substitution u = 1 + ® 2 . 

423. Compute 


In = J 2®(l +x 2 ) n dx. 


424. If f'(x) = x — l/® 2 and /(1) = 1/2 find /(®). 

425. Sketch the graph of the curve y = yjx + 1 and deter¬ 
mine the area of the region enclosed by the curve, the 
®-axis and the lines ® = 0, * = 4. 

426. Find the area under the curve y = y/Qx + 4 and above 
the ®-axis between x = 0 and x = 2. Draw a sketch of 
the curve. 

427. G raph the curve y = 2y/\ — ® 2 , x £ [0, 1], and find the 
area enclosed between the curve and the ®-axis. (Don’t 
evaluate the integral, but compare with the area under 
the graph of y = y/l — x 2 .) 

428. Determine the area under the curve y = y/a 2 — x 2 
and between the lines x = 0 and x = a. 

429. Graph the curve y = 2y/9 — x 2 and determine the 
area enclosed between the curve and the ®-axis. 

430. Graph the area between the curve y 2 = 4x and the 
line x = 3. Find the area of this region. 

431. Find the area bounded by the curve y = 4 — x 2 and 
the lines y = 0 and y = 3. 

432. Find the area enclosed between the curve y = sin 2®, 
0 < x < 7t/ 4 and the axes. 

433. Find the area enclosed between the curve y — cos2®, 
0 < ® < 7 t/ 4 and the axes. 

434. G raph y 2 + 1 = ®, and find the area enclosed by the 
curve and the line ® = 2. 


435. Find the area of the region bounded by the parabola 
y 2 = 4® and the line y = 2®. 

436. Find the area bounded by the curve y = ®(2 — ®) and 
the line ® = 2 y. 

437. Find the area bounded by the curve ® 2 = 4 y and the 
line x = 4y — 2. 

438. Calculate the area of the region bounded by the parabo¬ 
las y = x 2 and ® = y 2 . 

439. Find the area of the region included between the 
parabola y 2 = ® and the line ® + y = 2. 

440. Find the area of the region bounded by the curves 
y = y/x and y — x. 

441. Group Problem. 

You asked your assistant Joe to produce graphs of a 
function /(®), its derivative f'(x) and an antiderivative 

F(x) of /(®). 

Unfortunately Joe simply labelled the graphs "A,” 
” B," and “C," and now he doesn’t remember which graph 
is /, which is /' and which is F. Identify which graph is 
which and explain your answer. 



442. Group Problem. 

Below is the graph of a function y = /(®). 



The function F{x) (graph not shown) is an antiderivative 
of /(®). Which among the following statements true? 

(a) F(a) = F(c) 

(b) F(b) = 0 

(c) F{b) > F(c) 

(d) The graph of y = F(x ) has two inflection points? 

Use a substitution to evaluate 
the following integrals. 
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443. 

444. 

445. 

446. 

447. 

448. 

449. 

450. 


f 2 u du 
h 1 + u 2 
f 5 x dx 


I o y/x + 1 
j' 2 x 2 dx 

I I v / 2aT+T 
r 5 sds 

/o ^+2 

f 2 xdx 


h 1 + x 2 

/*7T 

/ cos(d + |)d# 

'o 

f . 7T + * , 

/ sm —-—da; 

f sin 2* 

/ v 7 1 + cos 2* 


da; 


/' 7r/3 2 

451. / sin dcos#d(9 

»/7r/4 


452. 

453. 

454. 

455. 

456. 

457. 


1 


, dr 


r lnr 
sin 2a; 

1 + cos 2 x 
sin 2x 
1 + sin * 


dx 

dx 


f z\J 1 — z 2 dz 

Jo 


In 2x 


dx 


f £(l + 2£ 2 ) 10 d£ 

J t =o 

r 3 

458. / sin p (cos 2 p) 4 dp 


459. J ae a da 

i 

460. f ** dl 
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CHAPTER 8 


Applications of the integral 


The integral appears as the answer to many different questions. In this chapter we will describe a number 
of “things which are an integral.” In each example there is a quantity we want to compute, and which we can 
approximate through Riemann-sums. After letting the partition become arbitrarily fine we then find that the 
quantity we are looking for is given by an integral. The derivations are an important part of the subject. 


1. Areas between graphs 

Suppose you have two functions / and g on an interval [a, 6], one of which is always larger than the other, 
i.e. for which you know that f(x) < g(x) for all x in the interval [a, b\. Then the area of the region between 
the graphs of the two functions is 

r b 

( 61 ) 


Area = / (g(x) — f(x))dx. 
J a 


To get this formula you approximate the region by a large number of thin rectangles. Choose a partition 
a = Xo < x\ < • • • < x n = b of the interval [a, &]; choose a number c k in each interval [xk-i , £&]; form the 
rectangles 

Xk-i < x < x k , f(ck)<y<g(ck). 

The area of this rectangle is 

width x height = Ax k x (g(c k ) - /(c fc )). 

Hence the combined area of the rectangles is 

R = {g(c l) - f(ci))Axi H-b (g{c n ) - f(Cn))Ax n 

which is just the Riemann-sum for the integral 

I = [ {g{x) - f(x))dx. 

J a 

So, 

(1) since the area of the region between the graphs of / and g is the limit of the combined areas of the 
rectangles, 



Figure 1 . Finding the area between two graphs using Riemann-sums. 
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(2) and since this combined area is equal to the Riemann sum R , 

(3) and since the Riemann-sums R converge to the integral /, 

we conclude that the area between the graphs of / and g is exactly the integral I. 

2. Exercises 


461. Find the area of the region bounded by the parabola 
y 2 = 4x and the line y = 2x. 

462. Find the area bounded by the curve y = x(2 — x) and 
the line x = 2 y. 

463. Find the area bounded by the curve x 2 = 4y and the 
line x = 4y — 2. 

464. Calculate the area of the region bounded by the parabo¬ 
las y = x 2 and x = y 2 . 

465. Find the area of the region included between the 
parabola y 2 = x and the line x + y = 2. 

466. Find the area of the region bounded by the curves 

y = \fx and y = x. 

467. Use integration to find the area of the triangular region 
bounded by the lines y — 2x + 1, y = 3x + 1 and x = 4. 

468. Find the area bounded by the parabola x 2 -2 = y 
and the line x + y = 0. 

469. Where do the graphs of /(*) = x 2 and g{x) = 
3/(2 +a: 2 ) intersect? Find the area of the region which 
lies above the graph of g and below the graph of g. 


(Hint: if you need to integrate 1/(2+ * 2 ) you could 
substitute x = u\/2.) 

470. Graph the curve y = (l/2)x 2 + 1 and the straight line 
y = x + 1 and find the area between the curve and the 
line. 

471. Find the area of the region between the parabolas 

y 2 — x and x 2 = 16 y. 

472. Find the area of the region enclosed by the parabola 
y 2 = 4 ax and the line y = mx. 

473. Find a so that the curves y = x 2 and y = a cos® 
intersect at the points (x, y) = (J, jg). Then find the 
area between these curves. 

474. Group Problem. 

Write a definite integral whose value is the area of 
the region between the two circles x 2 + y 2 = 1 and 
(x — l) 2 + y 2 = 1. Find this area. If you cannot evaluate 
the integral by calculus you may use geometry to find 
the area. Hint: The part of a circle cut off by a line is a 
circular sector with a triangle removed. 


3. Cavalieri’s principle and volumes of solids 

You can use integration to derive the formulas for volumes of spheres, cylinder, cones, and many many 
more solid objects in a systematic way. In this section we : ll see the “method of slicing.” 

3.1. Example — Volume of a pyramid. As an example let’s compute the volume of a pyramid whose 
base is a square of side 1, and whose height is 1. Our strategy will be to divide the pyramid into thin 
horizontal slices whose volumes we can compute, and to add the volumes of the slices to get the volume of 
the pyramid. 

To construct the slices we choose a partition of the (height) interval [0,1] into N subintervals, i.e. we 
pick numbers 

0 = Xo < Xi < X 2 < ■ ■ ■ < Xn-1 < xn = 1, 

and as usual we set Aa+ = Xk — Xk- 1 , we define the mesh size of the partition to be the largest of the Aa+. 

The fc th slice consists of those points on the pyramid whose height is between Xk-i and Xk■ The 
intersection of the pyramid with the plane at height £ is a square, and by similarity the length of the side of 
this square is 1 — x. Therefore the bottom of the k th slice is a square with side 1 — Xk- i, and its top is a 
square with side 1 — a+. The height of the slice is Xk — Xk-i = Aa+. 

Thus the k th slice contains a block of height A Xk whose base is a square with sides 1 — xy, and its volume 
must therefore be larger than (1 — Xk) 2 Axk- On the other hand the k th slice is contained in a block of the 
same height whose base is a square with sides 1 — Xk-i■ The volume of the slice is therefore not more than 
(1 — 2 +_i) 2 Aa+. So we have 

(1 — Xk) 2 Axk < volume of k th slice < (1 — Xk-i) 2 Axk- 
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1 

x + Ax 
x 


0 


Figure 2. The slice at height a: is a square with side 1 — x. 

Therefore there is some c/~ in the interval [xk-\,Xk] such that 

volume of k th slice = (1 — c^) 2 Ax^. 

Adding the volumes of the slices we find that the volume V of the pyramid is given by 

V = (1 - ci) 2 Aa:i H-b (1 - cjv) 2 Ax N . 

The right hand side in this equation is a Riemann sum for the integral 

I = f (1 — x) 2 dx 

Jo 

and therefore we have 

I = lim{(l — ci) 2 Aa’i H-+ (1 — cn) 2 Axn} = V. 

Compute the integral and you find that the volume of the pyramid is 



3.2. General case. The “method of slicing” which we just used to compute the volume of a pyramid 
works for solids of any shape. The strategy always consists of dividing the solid into many thin (horizontal) 
slices, compute their volumes, and recognize that the total volume of the slices is a Riemann sum for some 
integral. That integral then is the volume of the solid. 

To be more precise, let a and b be the heights of the lowest and highest points on the solid, and let 
a = xo < Xi < X2 < ■ ■ ■ < Xjv-i < Xn = b be a partition of the interval [a, b ]. Such a partition divides the 
solid into N distinct slices, where slice number k consists of all points in the solid whose height is between 
Xk-i and Xk- The thickness of the fc th slice is Axk = Xk — Xk-i■ If 

A(x) = area of the intersection of the solid with the plane at height x. 
then we can approximate the volume of the k th slice by 

A(c k ) Ax k 

where Ck is any number (height) between Xk-i and Xk- 
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Figure 3. Slicing a solid to compute its volume. The volume of one slice is approximately the product 
of its thickness (A*) and the area A(x) of its top. Summing the volume A(x)Ax over all slices leads 
approximately to the integral f f(x)dx. 


The total volume of all slices is therefore approximately 

V ~ A(ci)Axi + • • • + A(cn)Axn. 

While this formula only holds approximately, we expect the approximation to get better as we make the 
partition finer, and thus 

(62) V = lim{A(ci)Ax’ 1 H— • + A(cn)Axn} ■ 

On the other hand the sum on the right is a Riemann sum for the integral I = J^ A(x)dx , so the limit is 
exactly this integral. Therefore we have 

(63) V = f A(x)dx. 




Figure 4. Cavalieri’s principle. Both solids consist of a pile of horizontal slices. The solid on the right 
was obtained from the solid on the left by sliding some of the slices to the left and others to the right. 

This operation does not affect the volumes of the slices, and hence both solids have the same volume. 

3.3. Cavalieri’s principle. The formula (63) for the volume of a solid which we have just derived 
shows that the volume only depends on the areas A(x) of the cross sections of the solid, and not on the 
particular shape these cross sections may have. This observation is older than calculus itself and goes back at 
least to Bonaventura Cavalieri (1598 - 1647) who said: If the intersections of two solids with a horizontal 
plane always have the same area, no matter what the height of the horizontal plane may be, then the two 
solids have the same volume. 

This principle is often illustrated by considering a stack of coins: If you put a number of coins on top of 
each other then the total volume of the coins is just the sum of the volumes of the coins. If you change the 
shape of the pile by sliding the coins horizontally then the volume of the pile will still be the sum of the 
volumes of the coins, i.e. it doesn’t change. 
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3.4. Solids of revolution. In principle, formula (63) allows you to compute the volume of any solid, 
provided you can compute the areas A(x) of all cross sections. One class of solids for which the areas of the 
cross sections are easy are the so-called “solids of revolution.” 



Figure 5. A solid of revolution consists of all points in three-dimensional space whose distance r to the 
a:-axis satisfies r < f(x). 


A solid of revolution is created by rotating (revolving) the graph of a positive function around the x-axis. 
More precisely, let / be a function which is defined on an interval [a, b] and which is always positive ( f(x ) > 0 
for all x). If you now imagine the a;-axis floating in three dimensional space, then the solid of revolution 
obtained by rotating the graph of / around the £-axis consists of all points in three-dimensional space with 
a < x < 6, and whose distance to the z-axis is no more than f(x). 

Yet another way of describing the solid of revolution is to say that the solid is the union of all discs which 
meet the cc-axis perpendicularly and whose radius is given by r = f(x). 

If we slice the solid with planes perpendicular to the ir-axis, then (63) tells us the volume of the solid. 
Each slice is a disc of radius r = f(x) so that its area is A{x) = 7rr 2 = nf(x) 2 . We therefore find that 

r b 

(64) V = it f(x) 2 dx. 


4. Examples of volumes of solids of revolution 
4.1. Problem 1: Revolve 1Z around the y-axis . Consider the solid obtained by revolving the region 

K={{x,y) | 0 < a; < 2, (a;-l) 2 <y<l} 

around the y-axis. 
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Figure 6. Computing the volume of the solid you get when you revolve the region 1Z around the y-axis. 
A horizontal cross section of the solid is a "washer” with inner radius ri n , and outer radius r ou t. 




Solution: The region we have to revolve around the y-axis consists of all points above the parabola 
y = (x — l) 2 but below the line y = 1. 

If we intersect the solid with a plane at height y then we get a ring shaped region, or “annulus”, i.e. a 
large disc with a smaller disc removed. You can see it in the figure below: if you cut the region 1Z horizontally 
at height y you get the line segment AB, and if you rotate this segment around the j/-axis you get the grey 
ring region pictured below the graph. Call the radius of the outer circle r out and the radius of the inner circle 
r- ln . These radii are the two solutions of 

y = (!- 0 2 


so they are 


r- m = 1 - y/y, r out = 1 + y/y. 


The area of the cross section is therefore given by 

A(y) = 7T r 2 ut - 7rrf n = tt(1 + y/yf - tt(1 - y/yf = 4t Tyjy. 


300 















Problem 2: Rotate the line segment AB around 
the vertical line x = — 1 and you get a washer. 



Problem 3: If you rotate the line segment AB around 
the horizontal line y = 2 you once again get a washer. 


The y-values which occur in the solid are 0 < y < 1 and hence the volume of the solid is given by 

I* 1 

v = A(y)dy = 4n y/ydy = 4?r x § = —. 

Jo Jo 6 

4.2. Problem 2: Revolve 1Z around the line x = —1. Find the volume of the solid of revolution 
obtained by revolving the same region 1Z around the line x = — 1. 

Solution: The line x = — 1 is vertical, so we slice the solid with horizontal planes. The height of each 
plane will be called y. 
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As before the slices are ring shaped regions but the inner and outer radii are now given by 
An = 1 ~b An — 2 Amt = 1 T £out = 2 T y/lj. 

The volume is therefore given by 

nl /»1 107T 

V = Jo ^ r ° ut ~~ m ^ dy = n J 0 8v ^ dy= ^~' 

4.3. Problem 3: Revolve 7Z around the line y = 2. Compute the volume of the solid you get when 
you revolve the same region 1Z around the line y = 2. 

Solution: This time the line around which we rotate 7 Z is horizontal, so we slice the solid with planes 
perpendicular to the rr-axis. 

A typical slice is obtained by revolving the line segment AB about the line y = 2. The result is again an 
annulus, and from the figure we see that the inner and outer radii of the annulus are 

An = 1? Amt — 2 (1 X ) . 

The area of the slice is therefore 

A(x) = 7t{2 — (1 — x) 2 Y — 7rl 2 = 7 t {3 — 4(1 — x) 2 + (1 — a:) 4 } . 

The x values which occur in the solid are 0 < x < 2, and so its volume is 

V = tt f {3 — 4(1 — x) 2 + (1 — a;) 4 } dx 

Jo 

— 7r [3cc + |(1 — x) 3 — |(1 — a:) 5 ]o 


5. Volumes by cylindrical shells 

Instead of slicing a solid with planes you can also try to decompose it into 
cylindrical shells. The volume of a cylinder of height h and radius r is tt r 2 h 
(height times area base). Therefore the volume of a cylindrical shell of height 
h, (inner) radius r and thickness A?’ is 

nh(r + Ar) 2 — n hr 2 = Trh{2r + Ar)Ar 
« 2TrhrAr. 



Now consider the solid you get by revolving the region 

TZ = {(A y) \ a < x < b,0 < y < /( x)} 

around the y-axis. By partitioning the interval a < x < b into many small intervals we can decompose the 
solid into many thin shells. The volume of each shell will approximately be given by 2Trxf(x)Ax. Adding the 
volumes of the shells, and taking the limit over finer and finer partitions we arrive at the following formula 
for the volume of the solid of revolution: 

(65) V = 2n f xf(x) dx. 

J a 

If the region 1Z is not the region under the graph, but rather the region 
between the graphs of two functions f{x) < g(x), then we get 

f b 

V = 2n x{g(x) — f(x)} dx. 
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Figure 7. Computing the volume of a circus tent using cylindrical shells. This particular tent is obtained 
by rotating the graph of y = e~ x , 0 < x < 1 around the j/-axis. 


5.1. Example The solid obtained by rotating TZ about the y- 
axis, again. The region TZ from §4.1 can also be described as 

TZ = {(a :,y) | 0 < x < 2 ,f(x) <y< g(x)}, 

where 

f(x) = (x- l) 2 and g(x) = 1. 

The volume of the solid which we already computed in §4.1 is thus given by 


V = 2?r f x|l — {x — l) 2 } dx 

Jo 

= 27 t {—x 3 +2x 2 }dx 

J o 


= M~Z X +3 X j 0 


= 8tt/3, 

which coincides with the answer we found in §4.1. 


6. Exercises 


475. Group Problem. 

What do the dots in "lim...” in equation (62) stand 
for? (i.e. what approaches what in this limit?) 

Draw and describe the solids whose 
volume you are asked to compute 
in the following problems: 

476. Find the volume enclosed by the paraboloid obtained 
by rotating the graph of f(x) = R\Jx/H (0 < x < H) 
around the rr-axis. Here R and H are positive constants. 
Draw the solid whose volume you are asked to compute, 
and indicate what R and H are in your drawing. 


Find the volume of the solids you get by rotating 
each of the following graphs around the x-axis: 

477. f(x) = x, 0 < x < 2 

478. f(x) = V^x, 0<x<2 

479. f(x) = (l + x 2 ) 1 ^ 2 , |a;| < 1 

480. f{x) = sin®, 0 < x < 7r 

481. f(x) = 1 — x 2 , |ai| < 1 

482. f(x) — cosx,0<x<n (!!) 

483. f(x) = l/cosx, 0 < x < 7t/4 
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484. Find the volume that results by rotating the semicircle 
y = VR 2 — x 2 about the x-axis. 

485. Let T be triangle l<x<2, 0<j/<3x — 3. 

(a) Find the volume of the solid obtained by rotating 
the triangle T around the x-axis. 

(b) Find the volume that results by rotating the 
triangle T around the y axis. 

(c) Find the volume that results by rotating the 
triangle T around the line x = —1. 


(d) Find the volume that results by rotating the 
triangle T around the line y = —1. 

486. Group Problem. 

(a) A spherical bowl of radius a contains water to a 
depth h < 2a. Find the volume of the water in the bowl. 
(Which solid of revolution is implied in this problem?) 

(b) Water runs into a spherical bowl of radius 5 ft at 
the rate of 0.2 ft 3 /sec. Flow fast is the water level rising 
when the water is 4 ft deep? 


7. Distance from velocity, velocity from acceleration 


7.1. Motion along a line. If an object is moving on a straight line, and if its position at time t is x(t), 
then we had defined the velocity to be v(t) = x'(t). Therefore the position is an antiderivative of the velocity, 
and the fundamental theorem of calculus says that 

r^b 

(66) J v(t) dt = x(t b ) - x(t a ), 

or 

/ tb 

v(t) dt. 

In words, the integral of the velocity gives you the distance travelled of the object (during the interval of 
integration). 

Equation (66) can also be obtained using Riemann sums. Namely, to see how far the object moved 
between times t a and tb we choose a partition t a = to <t\ < • • • < tjv = Let A s k be the distance travelled 
during the time interval (tk-i,tk). The length of this time interval is At*, = tk — tk- 1 - During this time 
interval the velocity v(t) need not be constant, but if the time interval is short enough then we can estimate 
the velocity by v(ck ) where c k is some number between tk -i and tk- We then have 

A s k = v(c k )At k 


and hence the total distance travelled is the sum of the travel distances for all time intervals tk- 1 < t < t k , 

i.e. 

Distance travelled ~ Asi H-+ A sn = v(ci)Afi H-+ v(cN)AtN. 

The right hand side is again a Riemann sum for the integral in (66). As one makes the partition finer and 
finer you therefore get 

rib 

Distance travelled = / v(t) dt. 


The return of the dummy. Often you want to write a formula for x(t) = ■ ■ ■ rather than x(tb) = ■ • • 
as we did in (66), i.e. you want to say what the position is at time t, instead of at time t a . For instance, you 
might want to express the fact that the position x(t ) is equal to the initial position a;(0) plus the integral of 
the velocity from 0 to t. To do this you cannot write 

x(t) = x(0) + [ v(t) dt BAD FORMULA 

J o 

because the variable t gets used in two incompatible ways: the t in x(t) on the left, and in the upper bound 
on the integral (f*) are the same, but they are not the same as the two t’s in v(t)dt. The latter is a dummy 
variable (see §8 and §3.3). To fix this formula we should choose a different letter or symbol for the integration 
variable. A common choice in this situation is to decorate the integration variable with a prime (t’), a tilde 
(t) or a bar (t). So you can write 

x(t) = x(0) + f v(t) dt 
Jo 
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7.2. Velocity from acceleration. The acceleration of the object is by definition the rate of change of 
its velocity, 

a{t) = v'(t), 

so you have 

v(t) = t;(0) + f a{t)dt. 

Jo 

Conclusion: If you know the acceleration a(t) at all times t, and also the velocity v(0) at time t = 0, then you 
can compute the velocity v(t) at all times by integrating. 


7.3. Free fall in a constant gravitational field. If you drop an object then it will fall, and as it falls 
its velocity increases. The object’s motion is described by the fact that its acceleration is constant. This 
constant is called g and is about 9.8m/sec 2 ss 32ft/sec 2 . If we dsignate the upward direction as positive then 
v{t) is the upward velocity of the object, and this velocity is actually decreasing. Therefore the constant 
acceleration is negative: it is —g. 

If you write h(t ) for the height of the object at time t then its velocity is v(t ) = and its acceleration 

is h"(t). Since the acceleration is constant you have the following formula for the velocity at time t : 

v(t) = v(0) + [ {-g) dt = v{0) - gt. 

Jo 

Here u(0) is the velocity at time t = 0 (the “initial velocity”). 

To get the height of the object at any time t you must integrate the velocity: 

h{t) = h( 0) + f v(t) dt (Note the use of the dummy t) 

Jo 

= h{ 0) + / { u(0) — gt} dt (use v{t) = r(0) — gt) 

Jo 

= K 0) + [v(0)t- Jgt 2 ]] 

= h{ 0) + v(0)t - \gt 2 . 

For instance, if you launch the object upwards with velocity 5ft/sec from a height of 10ft, then you have 

h( 0) = 10ft, w(0) = +5ft/sec, 

and thus 

h{t ) = 10 + 5 1- 32f 2 /2 = 10 + 5t - 16f 2 . 

The object reaches its maximum height when h(t) has a maximum, which is when h'(t) = 0. To find that 
height you compute h'{t) = 5 — 32f and conclude that h{t) is maximal at f = J^sec. The maximal height is 
then 

h — — 10 + ^ ^ — fO^ft 

'‘-max — ,6 V32/ — '32 64 — 


7.4. Motion in the plane — parametric curves. To describe the motion of an object in the plane 
you could keep track of its x and y coordinates at all times t. This would give you two functions of t 1 namely, 
x(t) and y{t), both of which are defined on the same interval to < t < t\ which describes the duration of the 
motion you are describing. In this context a pair of functions (x{t),y(t)) is called a parametric curve. 
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As an example, consider the motion described by 

x(t) = cost, y(t) = sint(0 <t< 2ir). 

In this motion the point (x(t),y(t)) lies on the unit circle since 

x(t) 2 + y(t) 2 = cos 2 1 + sin 2 1 = 1. 

As t increases from 0 to 2i r the point (ir(t), y(t)) goes around the unit circle exactly once, in the 
counter-clockwise direction. 



Figure 8. Two motions in the plane. On the left x(t) = cost, y(t) = sint with 0 < t < 2n, and on the 
right x(t) = t, y(t ) = ^/(l — t 2 ) with — 1 < t < 1. 


In another example one could consider 


x(t) = t, y(t) = \/l — t 2 , (—1 < t < 1). 


Here at all times the x and y coordinates satisfy 


x(t) 2 +y{t) 2 = 1 


again so that the point (x(t),y(t)) = (t, V1 — t 2 ) again lies on the unit circle. Unlike the previous example 
we now always have y(t) > 0 (since y(t) is the square root of something), and unlike the previous example the 
motion is only defined for — 1 < t < 1. As t increases from —1 to +1, x(t) = f does the same, and hence the 
point (x(t), y(t)) moves along the upper half of the unit circle from the leftmost point to the rightmost point. 
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7.5. The velocity of an object moving in the plane. We have seen 
that the velocity of an object which is moving along a line is the derivative of 
its position. If the object is allowed to move in the plane, so that its motion 
is described by a parametric curve (x(t), y(t)), then we can differentiate both 
x(t) and y(t), which gives us x'(t) and and which leaves us with the 

following question: At what speed is a particle moving if it is undergoing the 
motion (x(t),y(t)) (t a < t < tb ) ? 



To answer this question we consider a short time interval (t, t+At). During 
this time interval the particle moves from (x(t), y(t)) to (x(t + At), t(t + At)). Hence it has traveled a distance 


As = yj {Ax) 2 + (Ay) 2 

where 

Ax = x(t + At) — a;(t), and Ay = y{t + At) — y{t). 
Dividing by At you get 



for the average velocity over the time interval [t, t + At]. Letting At —> 0 you find the velocity at time t to be 



7.6. Example — the two motions on the circle from §7.4. If a point moves along a circle according 
to x(t) = cost, y(t) = sint (figure 8 on the left) then 

dx . dy 

— = — sin t, , — = + cos t 
dt dt 

so 

v(t) = \J (— cost) 2 + (sint) 2 = 1. 

The velocity of this motion is therefore always the same; the point (cost, sint) moves along the unit circle 
with constant velocity. 

In the second example in §7.4 we had x(t) = t, y(t) = y/l — t 2 , so 

dx ^ dy —t 

dt ’ dt y/l -1 2 

whence 

" <t) = v^w5 = 

Therefore the point (t, y/l — t 2 ) moves along the upper half of the unit circle from the left to the right, and 
its velocity changes according to v = 1/y/\ — t 2 . 


8. The length of a curve 

8.1. Length of a parametric curve. Let (x(t),y(t)) be some parametric curve defined for t 0 < t < 

To find the length of this curve you can reason as follows: The length of the curve should be the distance 
travelled by the point (x(t),y(t)) as t increases from t a to tb . At each moment in time the velocity v(t) of the 
point is given by (67), and therefore the distance traveled should be 

(68) s = J t v(t) dt = y/x'(t) 2 + y'{t) 2 dt. 
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Alternatively, you could try to compute the distance travelled by means of 
Riemann sums. Choose a partition 


t a ff) A tl <C 


< f/V ~ tb 


of the interval \t ai t b \. You then get a sequence of points P 0 (x(t 0 ), y(t 0 )), P\(x(ti), y{t\)), 

PN(x(tN),y(tN )), and after “connecting the dots” you get a polygon. You Pi 
could approximate the length of the curve by computing the length of this polygon. The distance between 
two consecutive points Pk-i and P k is 



As k = V (Ax-fc) 2 + (A y k ) 2 
j ( Ax k \ 

' V V A t k ) 


f A yk \ 2 
VA t k ) 
\/x'(c k ) 2 + y'(c k ) 2 A t k 


A tk 


where we have approximated the difference quotients 

Ax k 


and 


A yk 


Atk At k 

by the derivatives x’(c k ) and y'{c k ) for some c k in the interval [t k -i,tk\- 
The total length of the polygon is then 

\/ x'(c\) 2 + y'(ci) 2 Ati H-h y/x'id) 2 + y'{c 1 ) 2 Ati 

This is a Riemann sum for the integral \/x'{t ) 2 + y'{t ) 2 dt, and hence we find (once more) that the length 

of the curve is 


r^b 


S = 


sjx'(t) 2 + y'{t) 2 dt. 


8.2. The length of the graph of a function. The graph of a function (y = f{x) with a < x < b) is 
also a curve in the plane, and you can ask what its length is. We will now find this length by representing the 
graph as a parametric curve and applying the formula (68) from the previous section. 

The standard method of representing the graph of a function y = f(x) by a parametric curve is to choose 

x(t) = t, and y{t) = /(f), for a < t < b. 


This parametric curve traces the graph of y = f(x) from left to right as t increases from a to b. 

Since x'{t ) = 1 and y'(t) = f(t) we find that the length of the graph is 

rb 

L= / y/TT Tffldt. 

J a 

The variable t in this integral is a dummy variable and we can replace it with any other variable we like, for 
instance, x: 


(69) 


L = 



\J 1 + f'(x) 2 dx. 


In Leibniz’ notation we have y 


f(x) and f'(x) = dy/dx so that Leibniz would have written 


L = 



dx. 
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9. Examples of length computations 


9.1. Length of a circle. In §8 we parametrized the unit circle by 

x(t) = cost, y(t)=sint, (0 < t < 2n) 

and computed yjx'(t) 2 + y'(t) 2 = 1. Therefore our formula tells us that the length of the unit circle is 


/*27T /»27T 

L = / \Jx'{t) 2 + y'[t) 2 dt= 1 dt = 27r. 

Jo Jo 

This cannot be a PROOF that the unit circle has length 27r since we have already used that fact to define 
angles in radians, to define the trig functions Sine and Cosine, and to find their derivatives. But our 
computation shows that the length formula (69) is at least consistent with what we already knew. 


9.2. Length of a parabola. Consider our old friend, the parabola y = x 2 , 0 < x < 1. While the area 
under its graph was easy to compute (|), its length turns out to be much more complicated. 

Our length formula (69) says that the length of the parabola is given by 

) 2 dx 

To find this integral you would have to use one of the following (not at all obvious) substitutions 1 

x = \ (z — -) (then 1 + 4x 2 = \(z + 1/z) 2 so you can simplify the yf ■) 
z ' 4 

or (if you like hyperbolic functions) 

x = | sinhix; (in which case \/l + 4a: 2 = coshtc.) 


= / \/l + 4x 2 dx. 



9.3. Length of the graph of the Sine function. To compute the length of the curve given by 
y = sin a;, 0 < x < 7r you would have to compute this integral: 


Unfortunately this is not an integral which can be computed in terms of the functions we know in this course 
(it’s an “elliptic integral of the second kind.”) This happens very often with the integrals that you run into 
when you try to compute the length of a curve. In spite of the fact that we get stuck when we try to compute 
the integral in (70), the formula is not useless. For example, since 1 << cos a; < 1 we know that 

1 < \A + cos 2 x < vT+T = \/2, 


dx= \Jl + 
Jo 


cos 2 x dx. 


(70) 


L = 


l+( 


dsin 

dx 


and therefore the length of the Sine graph is bounded by 

7T plT _ /*7T 

/ 1 dx < / \f\ + cos 2 x dx < / v/2 dx, 

Jo Jo Jo 


i.e. 


< L < nV2. 


10 . 

487. Find the length of the piece of the graph of y = 
\/l — x 2 where 0 < x < ^. 

The graph is a circle, so there are two ways of com¬ 
puting this length. One uses geometry (length of a circular 
arc = radius x angle), the other uses an integral. 

Use both methods and check that you get the same 
answer. 


Exercises 

488. Compute the length of the part of the evolute of the 
circle, given by 

x(t) = cos t — t sin t, y(t) = sin t + t cos t 
where 0 < t < n. 


489. Group Problem. 


Many calculus textbooks will tell you to substitute x = tan 6, but the resulting integral is still not easy. 
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Show that the Archimedean spiral, given by 
x(9) = 9 cos 6, y(9) = 9sm9 , 0 < 9 < n 
has the same length as the parabola given by 

V = \x 2 , 0 < X < 7T. 


Hint: you can set up integrals for both lengths. If you 
get the same integral in both cases, then you know the 
two curves have the same length (even if you don't try 
to compute the integrals). 


11. Work done by a force 

11.1. Work as an integral. In Newtonian mechanics a force which acts on an object in motion performs 
a certain amount of work, i.e. it spends a certain amount of energy. If the force which acts is constant, then 
the work done by this force is 

Work = Force x Displacement. 


For example if you are pushing a box forward then there will be two forces 
acting on the box: the force you apply, and the friction force of the floor on 
the box. The amount of work you do is the product of the force you exert and 
the length of the displacement. Both displacement and the force you apply are 
pointed towards the right, so both are positive, and the work you do (energy 
you provide to the box) is positive. 

The amount of work done by the friction is similarly the product of the friction force and the displacement. 
Here the displacement is still to the right, but the friction force points to the left, so it is negative. The work 
done by the friction force is therefore negative. Friction extracts energy from the box. 

Suppose now that the force F(t) on the box is not constant, and that its motion is described by saying 
that its position at time t is x(t). The basic formula work = force x displacement does not apply directly 
since it assumes that the force is constant. To compute the work done by the varying force F(t) we choose a 
partition of the time interval t a < t < tb into 

t a = to < t\ < • • • < tpf-l < tw = h 

In each short time interval ffc-i < t < tk we assume the force is (almost) constant and we approximate it by 
F(ck) for some tk -i < Ck < tk- If we also assume that the velocity v(t) = x'(t ) is approximately constant 
between times tk -i and tk then the displacement during this time interval will be 

x(tk ) - x(t k - 1 ) ~ v(c k )At k , 

where At*, = tk — tk- 1 - Therefore the work done by the force F during the time interval tk— i < t < tk is 

AW k = F(c k )v(ck)At k . 

Adding the work done during each time interval we get the total work done by the force between time t a and 
tb- 

W = F(a)v(ci)Ati 4- + F(cN)v(cN)At N . 

Again we have a Riemann sum for an integral. If we take the limit over finer and finer partitions we therefore 
find that the work done by the force F(t) on an object whose motion is described by x(t) is 

(71) w = J t F(t)v(t)dt , 

in which v(t) = x'(t) is the velocity of the object. 


displacement 
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11.2. Kinetic energy. Newton’s famous law relating the force exerted on an object and its motion 
says F = ma , where a is the acceleration of the object, m is its mass, and F is the combination of all forces 
acting on the object. If the position of the object at time t is x(t), then its velocity and acceleration are 
v(t) = x'{t) and aft) = v'(t) = x"(t), and thus the total force acting on the object is 


F(t) = ma[t) 



The work done by the total force is therefore 


(72) 


W = f F{t)v(t)dt 



v(t) dt. 


Even though we have not assumed anything about the motion, so we don’t know anything about the velocity 
v(t), we can still do this integral. The key is to notice that, by the chain rule, 


TO 


dv(t) 

dt 


v(t) 


d\mv(t) 2 

dt 


(Remember that m is a constant.) This says that the quantity 


K(t) = \mv(t ) 2 


is the antiderivative we need to do the integral (72). We get 


W= f v{t) dt = f K'(t)dt = K(tb) — K(t a )- 

J t a dt J ta 

In Newtonian mechanics the quantity K(t) is called the kinetic energy of the object, and our computation 
shows that the amount by which the kinetic energy of an object increases is equal to the amount of work done 
on the object. 


12. Work done by an electric current 



voltage = V(t) 


If at time t an electric current /(f) (measured in Ampere) flows through an electric 
circuit, and if the voltage across this circuit is V(t) (measured in Volts) then the energy 
supplied tot the circuit per second is I(t)V(t). Therefore the total energy supplied during 
a time interval to < t < ti is the integral 

rii 

Energy supplied = / I(t)V[t)dt. 

J to 


(measured in Joule; the energy consumption of a circuit is defined to be how much energy 
it consumes per time unit, and the power consumption of a circuit which consumes 1 Joule per second is said 
to be one Watt.) 


If a certain voltage is applied to a simple circuit (like a light bulb) then the current flowing through that 
circuit is determined by the resistance R of that circuit by Ohm’s law 2 which says 


I = 


V 

R' 


2 http://en.wikipedia.org/wiki/Qhm’s_law 
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12.1. Example. If the resistance of a light bulb is R = 20011, and if the voltage applied to it is 

V(t) = 150sin27r/f 

where / = 50sec _1 is the frequency, then how much energy does the current supply to the light bulb in one 
second? 

To compute this we first find the current using Ohm’s law, 

1 ^ = ~ r ^ = ^55 sin 27r ^ = °' 75 sin 2n ^ t ' ( Amp ) 

The energy supplied in one second is then 

nl sec 

E= I{t)V(t)dt 

Jo 

i 

(150sin27r/t) x (0.75 sin 2nft) dt 

= 112.5 f sin 2 (2nft) dt 
Jo 

You can do this last integral by using the double angle formula for the cosine, to rewrite 

sin 2 (27r/t) = |{l — cos47r/f} = \—\ cos47r ft. 

Keep in mind that / = 50, and you find that the integral is 

J sin 2 (27 Tft) dt=[^~ ft] l Q = 

and hence the energy supplied to the fight bulb during one second is 

E = 112.5 x \ = 56.25(Joule). 
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Chapter seven: 


L| ne and surface integrals functions 


1.6 Surfaces 

In the previous section wc discussed planes in Euclidean space. A plane is an example of 
a surface, which we will define informally 8 as the solution set of the equation x,y, ~ 
in R' ! , for some real-valued function F. for example, a plane given y ax + 3 

is the solution set of F(x,y,z) = 0 for the function F(x,y,z)-ax+ 6y + « + u 

2-dimensional. The plane is the simplest surface, since it is “fiat”. In this sec ion w 
look at some surfaces that are more complex, the most important of which are the sp it 

and the cylinder. 

Definition 1.9. A sphere S is the set of all points (x,y,z) in R 3 which are a fixed distance r 
(called the radius) from a fixed point P t = (x t ,y t ,z 0 ) (called the center of the sphere): 

S = ((x, y, z) : (x - x 0 ) 2 + (y - y„) 2 + (* ~ z») 2 = r 2 ) 

Using vector notation, this can he written in the equivalent form: 

S = {x: Hx-x 0 1| = r} 

whdre x = ( x,y,z) and x n = (x 0 ,yrt,Zn) are vectors. 

Figure 1.6.1 illustrates the vectorial approach to spheres. 


(1.29) 


(1.30) 



Note in Figure 1.6.1(a) that the intersection of the sphere with the xy-plane is a circle 
of radius r (i.e. a f>reat circle, given by x 2 + y' 2 = r 2 as a subset of R 2 ). Similarly for the 
intersections with the xz-plane and the y 2 -plane. In general, a plane intersects a sphere 
either at a single point or in a circle. 

\Sit O'Nkh.I. for h dre | «■ r and more rigorous discussion of surfaces. 
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Example 1.27. Find the intersection of the sphere x 2 + y 2 + z 2 = 169 with the plane 2 = 12. 


Solution: The sphere is centered at the origin and has radius 
13 = v/169, so it does intersect the plane 2 = 12. Putting 
z = 12 into the equation of the sphere gives 

x 2 + y 2 + 12 2 = 169 

x 2 + y 2 = 169 - 144 - 25 - 5 2 

which is a circle of radius 5 centered at (0,0,12), parallel to 
the xy -plane (see Figure 1.6.2). 



If the equation in formula (1.29) is multiplied out, we get an equation of the form: 

x 2 + y 2 + z 2 + ax + by + cz + d = 0 (1.31) 

for some constants a, b, c and d. Conversely, an equation of this form may describe a sphere, 
which can be determined by completing the square for the x, y and 2 variables. 


Example 1.28. Is 2x 2 + 2y 2 + 2z 2 - 8x + 4 y— 16 2 + 10 = 0 the equation of a sphere? 
Solution: Dividing both sides of the equation by 2 gives 

x 2 + y 2 + z 2 - 4x + 2y - 8z + 5 = 0 
(x 2 — 4x + 4) + (y 2 + 2y + 1) + (z 2 — 82 + 16) + 5 — 4 — 1-16 = 0 
(x - 2) 2 + (y + l) 2 + (2 - 4) 2 = 16 

which is a sphere of radius 4 centered at (2,—1,4). 


Example 1.29. Find the points(s) of intersection (if any) of the sphere from Example 1.28 
and the line x = 3 + t,y = l + 2t,z = 3-t. 

Solution: Put the equations of the line into the equation of the sphere, which was (x - 2) 2 + 
(y + l) 2 + (2 - 4) 2 = 16, and solve for t: 

(3 + ( - 2) 2 + (1 + 2t + l) 2 + (3 - ( - 4) 2 = 16 
(t + l ) 2 + ( 2 1 + 2 f + (-t- l) 2 = 16 
6t 2 + 12t- 10 = 0 


The quadratic formula gives the solutions t = -1 ± —. Putting those two values into the 

v6 

equations of the line gives the following two points of intersection: 


' 


„ 4 ^ 8 

2+ — ,-l + — 

s/6 s/6 


,4- 



and 


( 4 , 8 , 4 ) 
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If two spheres intersect, they do so either at a single point or in a circle. 


Example 1.30. Find the intersection (if any) of the spheres x 2 + y 2 + z 2 = 25 and x 2 + y 2 + (z- 
2) 2 = 16. 

Solution: For any point (x,y,z) on both spheres, we see that 

x 2 + y 2 + z 2 = 25 => x 2 + y 2 = 25-z 2 , and 

x 2 + y 2 + (z-2) 2 = 16 => x 2 + y 2 = 16-(z-2) 2 , so 

16 - (z - 2) 2 = 25 - z 2 => 4z - 4 = 9 => z = 13/4 

=> x 2 + y 2 = 25 - (13/4) 2 = 231/16 

.'. The intersection is the circle x 2 + y 2 = ^ of radius « 3.8 centered at (0,0, p). 

The cylinders that we will consider are right circular cylinders. These are cylinders ob¬ 
tained by moving a line L along a circle C in R 3 in a way so that L is always perpendicular 
to the plane containing C. We will only consider the cases where the plane containing C is 
parallel to one of the three coordinate planes (see Figure 1.6.3). 





Figure 1.6.3 Cylinders in K ;i 

For example, the equation of a cylinder whose base circle C lies in the xy-plane and is 
centered at (a, 5,0) and has radius r is 

(x-a) 2 + (y-b) 2 = r 2 , (1.32) 

where the value of the z coordinate is unrestricted. Similar equations can be written when 
the base circle lies in one of the other coordinate planes. A plane intersects a right circular 
cylinder in a circle, ellipse, or one or two lines, depending on whether that plane is parallel, 
oblique 9 , or perpendicular, respectively, to the plane containing C. The intersection of a 
surface with a plane is called the trace of the surface. 

9 i.e. at an angle strictly between 0° and 90°. 
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The equations of spheres and cylinders are examples of second-degree equations in [R 3 , i.e. 
equations of the form 


Ax 2 + By 2 + Cz 2 +Dxy + Exz + Fyz + Gx + Hy + Iz + J = 0 (1.33) 


for some constants A, B, J. If the above equation is not that of a sphere, cylinder, plane, 
line or point, then the resulting surface is called a quadric surface. 


One type of quadric surface is the ellipsoid, given 
by an equation of the form: 


„2 .,2 _2 

- + ^ + - = 1 
a 2 b 2 c 2 


(1.34) 


In the case where a = b = c, this is just a sphere. 
In general, an ellipsoid is egg-shaped (think of an 
ellipse rotated around its major axis). Its traces in 
the coordinate planes are ellipses. 



Figure 1.6.4 Ellipsoid 


Two other types of quadric surfaces are the hyperboloid of one sheet, given by an 
equation of the form: 


x 2 y 2 z 2 

-b --- 1 

a 2 b 2 c 2 


(1.35) 


and the hyperboloid of two sheets, whose equation has the form: 


y_ 

b 2 


= l 


(1.36) 
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For the hyperboloid of one sheet, the trace in any plane parallel to the xy -plane is an 
ellipse. The traces in the planes parallel to the xz- or yz-planes are hyperbolas (see Figure 
1.6.5), except for the special cases x = ±a and y = ±b; in those planes the traces are pairs of 
intersecting lines (see Exercise 8). 

For the hyperboloid of two sheets, the trace in any plane parallel to the xy- or xz -plane is 
a hyperbola (see Figure 1.6.6). There is no trace in the yz-plane. In any plane parallel to the 
yz-plane for which \x \ > \a\, the trace is an ellipse. 


The elliptic paraboloid is another type of quadric surface, 
whose equation has the form: 


„2 „,2 

x y z 


b 2 


(1.37) 


The traces in planes parallel to the xy -plane are ellipses, though 
in the xy -plane itself the trace is a single point. The traces in 
planes parallel to the xz- or yz-planes are parabolas. Figure 
1.6.7 shows the case where c > 0. When c < 0 the surface is 
turned downward. In the case where a = b, the surface is called 
a paraboloid of revolution, which is often used as a reflecting sur¬ 
face, e.g. in vehicle headlights. 10 

A more complicated quadric surface is the hyperbolic paraboloid, given by: 



y_ 

b 2 


2 

C 


(1.38) 



Figure 1.6.8 Hyperbolic paraboloid 


10 For a discussion of this see pp. 157-158 in Hecht. 
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The hyperbolic paraboloid can be tricky to draw; using graphing software on a computer 
can make it easier. For example, Figure 1.6.8 was created using the free Gnuplot package 
(see Appendix C). It shows the graph of the hyperbolic paraboloid 2 = y 2 -x 2 , which is the 
special case where a = b = 1 and c = — 1 in equation (1.38). The mesh lines on the surface are 
the traces in planes parallel to the coordinate planes. So we see that the traces in planes 
parallel to the xz-plane are parabolas pointing upward, while the traces in planes parallel 
to the yz-plane are parabolas pointing downward. Also, notice that the traces in planes 
parallel to the xy-plane are hyperbolas, though in the xy-plane itself the trace is a pair of 
intersecting lines through the origin. This is true in general when c < 0 in equation (1.38). 
When c > 0, the surface would be similar to that in Figure 1.6.8, only rotated 90° around 
the z-axis and the nature of the traces in planes parallel to the xz- or yz-planes would be 
reversed. 


The last type of quadric surface that we will consider is the 
elliptic cone, which has an equation of the form: 


x 2 y 2 z 2 _ 
a 2 b 2 c 2 


(1.39) 



Figure 1.6.9 Elliptic cone 


The traces in planes parallel to the xy-plane are ellipses, ex¬ 
cept in the xy-plane itself where the trace is a single point. 

The traces in planes parallel to the xz- or yz-planes are hyper¬ 
bolas, except in the xz- and yz-planes themselves where the 
traces are pairs of intersecting lines. 

Notice that every point on the elliptic cone is on a line which 
lies entirely on the surface; in Figure 1.6.9 these lines all go 
through the origin. This makes the elliptic cone an example of 
a ruled surface. The cylinder is also a ruled surface. 

What may not be as obvious is that both the hyperboloid of one sheet and the hyperbolic 
paraboloid are ruled surfaces. In fact, on both surfaces there are two lines through each 
point on the surface (see Exercises 11-12). Such surfaces are called doubly ruled surfaces, 
and the pairs of lines are called a regulus. 

It is clear that for each of the six types of quadric surfaces that we discussed, the surface 
can be translated away from the origin (e.g. by replacing x 2 by (x-x 0 ) 2 in its equation). It can 
be proved 11 that every quadric surface can be translated and/or rotated so that its equation 
matches one of the six types that we described. For example, 2 = 2xy is a case of equation 
(1.33) with “mixed” variables, e.g. with D f 0 so that we get an xy term. This equation does 
not match any of the types we considered. However, by rotating the x- and y-axes by 45° in 
the xy-plane by means of the coordinate transformation x = (x 1 -y')/V2, y = (x'+y')/V2, z = z ', 
then 2 = 2xy becomes the hyperbolic paraboloid z' = (x') 2 -(yO 2 in the ( x',y',z') coordinate 
system. That is, z = 2xy is a hyperbolic paraboloid as in equation (1.38), but rotated 45° in 
the xy-plane. 


n See Ch. 7 in Pogorelov. 
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Exercises 


A 

For Exercises 1-4, determine if the given equation describes a sphere. If so, find its radius 

and center. 

I. x 2 + y 2 + z 2 - 4x - 6y - 10z + 37 = 0 2. x 2 + y 2 + z 2 + 2x- 2y- 8z + 19 = 0 

3. 2x 2 + 2 y 2 + 2 z 2 + 4x + 4y + 4z - 44 - 0 4. x 2 + y 2 - z 2 + 12x + 2y - 4z + 32 = 0 

5. Find the point(s) of intersection of the sphere (x - 3) 2 + (y + l) 2 + (z - 3) 2 = 9 and the line 
x = -1 + 2t, y = -2- 3t, z = 3 + t. 

B 

6. Find the intersection of the spheres x 2 + y 2 + z 2 = 9 and (x - 4) 2 + (y + 2) 2 + (z - 4) 2 = 9. 

7. Find the intersection of the sphere x 2 + y 2 + z 2 = 9 and the cylinder x 2 + y 2 - 4. 

8. Find the trace of the hyperboloid of one sheet ^ + |s-^ = lin the plane x = a, and the 
trace in the plane y - b. 

2 2 

9. Find the trace of the hyperbolic paraboloid ^ - p = § i n the xy-plane. 

c 

10. It can be shown that any four noncoplanar points (i.e. points that do not lie in the same 
plane) determine a sphere. 12 Find the equation of the sphere that passes through the 
points (0,0,0), (0,0,2), (1,-4,3) and (0,-1,3). (Hint: Equation (1.31)) 

II. Show that the hyperboloid of one sheet is a doubly ruled surface, i.e. each point on 
the surface is on two lines lying entirely on the surface. (Hint: Write equation (1.35) as 
^-| = l- p, factor each side. Recall that two planes intersect in a line.) 

12. Show that the hyperbolic paraboloid is a doubly ruled surface. (Hint: Exercise 11) 

13. Let S be the sphere with radius 1 centered at (0,0,1), 
and let S* be S without the “north pole” point (0,0,2). Let 
( a,b,c) be an arbitrary point on S*. Then the line passing 
through (0,0,2) and ( a,b,c ) intersects the xy-plane at some 
point (x,y,0), as in Figure 1.6.10. Find this point (x,y,0) in 
terms of a, b and c. 

(Note: Every point in the xy-plane can be matched with a 
point on S*, and vice versa, in this manner. This method is 
called stereographic projection, which essentially identifies 
all of IR 2 with a “punctured” sphere.) 

12 See Welchons and Krickenberger, p. 160, for a proof. 



Figure 1.6.10 
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1.7 Curvilinear Coordinates 


The Cartesian coordinates of a point (x,y,z) are determined by 
following straight paths starting from the origin: first along the 
x-axis, then parallel to the y-axis, then parallel to the 2 -axis, as 
in Figure 1.7.1. In curvilinear coordinate systems, these paths can 
be curved. The two types of curvilinear coordinates which we will 
consider are cylindrical and spherical coordinates. Instead of ref¬ 
erencing a point in terms of sides of a rectangular parallelepiped, 
as with Cartesian coordinates, we will think of the point as ly¬ 
ing on a cylinder or sphere. Cylindrical coordinates are often used when there is symmetry 
around the 2 -axis; spherical coordinates are useful when there is symmetry about the origin. 

Let P = (x,y,z) be a point in Cartesian coordinates in R 3 , and let P 0 = (x,y,0) be the 
projection of P upon the xy-plane. Treating (x,y) as a point in R 2 , let ( r,9 ) be its polar 
coordinates (see Figure 1.7.2). Let p be the length of the line segment from the origin to P, 
and let <p be the angle between that line segment and the positive 2 -axis (see Figure 1.7.3). 
<p is called the zenith angle. Then the cylindrical coordinates ( r,6,z ) and the spherical 
coordinates (p,9,cp) of P(x,y,z) are defined as follows: 13 



Cylindrical coordinates ( r,6,z ): 


x = rcos0 r = \J x 2 + y 2 


y = rsinO 
z = z 


9 = tan -1 (=■:) 
2 = 2 


where 0<9 <n if y > 0 and n <9 <2n if y < 0 



Figure 1.7.2 

Cylindrical coordinates 


Spherical coordinates ( p,9,(p): 


X = psin0COS0 n - , / v2 4 - „2 4 - ,2 


y = psincp sinO 
2 = pcoscp 


p = y x^ + y* + 2 Z 

9 = tan” 1 (^) 

^ = “ s_1 (73=i5f) 


where 0<9 <n if y>0 and n <9 <2n if y<0 



Figure 1.7.3 

Spherical coordinates 

Both 9 and cp are measured in radians. Note that r > 0, 0 < 9 < 2n, p > 0 and 0 < (j) < n. 
Also, 9 is undefined when (x,y) = (0,0), and cp is undefined when (x,y,z) = (0,0,0). 


13 This “standard” definition of spherical coordinates used by mathematicians results in a left-handed system. 
For this reason, physicists usually switch the definitions of 6 and ip to make ( p,9,ip ) a right-handed system. 
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Example 1.31. Convert the point (-2,-2,1) from Cartesian coordinates to (a) cylindrical 
and (b) spherical coordinates. 


Solution: (a) r = \/(-2) 2 + (-2) 2 = 2\/2, 9 = tan 1 (^1) = tan 1 (1) = since y = — 2 < 0. 

:Ar,G,z)={ 2v / 2,f>l) 

(b) p = \/(—2) 2 + (-2) 2 + l 2 = \/9 = 3 , (p- cos -1 (|) « 1.23 radians. 

:Ap,e,(p)={ 3,f,1.23) 


For cylindrical coordinates ( r,6,z ), and constants r 0 , ()„ and z 0 , we see from Figure 1.7.4 
that the surface r = r 0 is a cylinder of radius r„ centered along the 2 -axis, the surface 0 = 9 0 
is a half-plane emanating from the 2 -axis, and the surface 2 = z 0 is a plane parallel to the 
xy-plane. 



Figure 1.7.4 Cylindrical coordinate surfaces 


For spherical coordinates (p,0,0), and constants p 0 , 6 0 and (p 0 , we see from Figure 1.7.5 
that the surface p = p 0 is a sphere of radius p 0 centered at the origin, the surface 0 = 0 H is a 
half-plane emanating from the 2 -axis, and the surface cf> = (p 0 is a circular cone whose vertex 
is at the origin. 





(c) (/> = 4>o 


Figure 1.7.5 Spherical coordinate surfaces 


Figures 1.7.4(a) and 1.7.5(a) show how these coordinate systems got their names. 
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Sometimes the equation of a surface in Cartesian coordinates can be transformed into a 
simpler equation in some other coordinate system, as in the following example. 

Example 1.32. Write the equation of the cylinder x 2 + y 2 = 4 in cylindrical coordinates. 
Solution: Since r = \/x 2 + y 2 , then the equation in cylindrical coordinates is r = 2. 

Using spherical coordinates to write the equation of a sphere does not necessarily make 
the equation simpler, if the sphere is not centered at the origin. 

Example 1.33. Write the equation (x - 2) 2 + (y - l) 2 + z 2 = 9 in spherical coordinates. 
Solution: Multiplying the equation out gives 

9 9 9 

x + y + z - 4x - 2y + 5 = 9 , so we get 

o 

p - 4psin0cos0-2psin0 sind-4 = 0 , or 
p 2 -2sin0(2cos0-sin0)p-4 = 0 

after combining terms. Note that this actually makes it more difficult to figure out what the 
surface is, as opposed to the Cartesian equation where you could immediately identify the 
surface as a sphere of radius 3 centered at (2,1,0). 


Example 1.34. Describe the surface given by 9 = z in cylindrical coordinates. 

Solution: This surface is called a helicoid. As the (vertical) z coordinate increases, so does 
the angle 6, while the radius r is unrestricted. So this sweeps out a (ruled!) surface shaped 
like a spiral staircase, where the spiral has an infinite radius. Figure 1.7.6 shows a section 
of this surface restricted to 0<z <4 ti and 0 < r < 2. 



Figure 1.7.6 Helicoid 6 — z 
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Exercises 


A 

For Exercises 1-4, find the (a) cylindrical and (b) spherical coordinates of the point whose 
Cartesian coordinates are given. 

1. (2,2x/3, — 1) 2. (-5,5,6) 3. (v^I,->/7,0) 4. (0,>/2,2) 

For Exercises 5-7, write the given equation in (a) cylindrical and (b) spherical coordinates. 
5. x 2 + y 2 + z 2 -25 6. x 2 + y 2 = 2y 7. x 2 + y 2 + 9z 2 = 36 


B 

8. Describe the intersection of the surfaces whose equations in spherical coordinates are 
0 = | and 0= |. 

9. Show that for a ^ 0, the equation p = 2a sin0 cos0 in spherical coordinates describes a 
sphere centered at (a,0,0) with radius |a|. 

c 

10. Let P = ( a,d,(p ) be a point in spherical coordinates, with a > 0 and 0 < (f> < n. Then P 
lies on the sphere p = a. Since 0 < (p < n, the line segment from the origin to P can be 
extended to intersect the cylinder given by r = a (in cylindrical coordinates). Find the 
cylindrical coordinates of that point of intersection. 

11. Let Pi andP 2 be points whose spherical coordinates are ( p 1 ,9 1 ,(p l ) and ip 2 ,9 2 ,(p 2 ), respec¬ 
tively. Let v 1 be the vector from the origin to P,, and let v 2 be the vector from the origin 
to P 2 . For the angle y between v x and v 2 , show that 

cos y = cos (f> 1 cos (p 2 + sin (f> 1 sin cf> 2 cos( 0 2 - 0 X ). 

This formula is used in electrodynamics to prove the addition theorem for spherical har¬ 
monics, which provides a general expression for the electrostatic potential at a point due 
to a unit charge. See pp. 100-102 in JACKSON. 

12. Show that the distance d between the points P 1 and P 2 with cylindrical coordinates 
(r 1 ,9 1 ,z 1 ) and ( r 2 ,9 2 ,z 2 ), respectively, is 

d = \Jr\ + r%- 2 r 1 r 2 cos( 0 2 - 0!) + (z 2 - zj 2 . 


13. Show that the distance d between the points Pi and P 2 with spherical coordinates 
(p u 9i,(pi) and ip 2 ,9 2 ,<p 2 ), respectively, is 

d = \/Pi + p 2 -2p 1 p 2 [sin0 1 sin0 2 cos(0 2 - 9 1 ) + cos (p 1 cos <p 2 ~\. 
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1.8 Vector-Valued Functions 


Now that we are familiar with vectors and their operations, we can begin discussing func¬ 
tions whose values are vectors. 


Definition 1.10. A vector-valued function of a real variable is a rule that associates a 
vector f (t) with a real number t, where t is in some subset D of R 1 (called the domain of f). 
We write f : D — R 3 to denote that f is a mapping of D into R 3 . 


For example, fit) = ti + t 2 j + t 3 k is a vector-valued function in R 3 , defined for all real num¬ 
bers t. We would write f : R —*• R 3 . At t = 1 the value of the function is the vector i + j + k, 
which in Cartesian coordinates has the terminal point (1,1,1). 

A vector-valued function of a real variable can be written in component form as 

fit) = fi(t)i + f 2 (t) j + f 3 (t) k 

or in the form 

fit) = 

for some real-valued functions /j(t), f 2 (t), f 3 (t), called the component functions of f. The first 
form is often used when emphasizing that fit) is a vector, and the second form is useful 
when considering just the terminal points of the vectors. By identifying vectors with their 
terminal points, a curve in space can be written as a vector-valued function. 


Example 1.35. Define f : R — R 3 by fit) = (cos t, sin t, t). 

This is the equation of a helix (see Figure 1.8.1). As the value of 
t increases, the terminal points of fit) trace out a curve spiraling 
upward. For each t, the x- and y-coordinates of fit) are x = cos t 
and y = sin t, so 

x +y = cos t + sin t = l. 

Thus, the curve lies on the surface of the right circular cylinder 
x 2 + y 2 = 1. 



It may help to think of vector-valued functions of a real variable in R 3 as a generalization 
of the parametric functions in R 2 which you learned about in single-variable calculus. Much 
of the theory of real-valued functions of a single real variable can be applied to vector-valued 
functions of a real variable. Since each of the three component functions are real-valued, it 
will sometimes be the case that results from single-variable calculus can simply be applied 
to each of the component functions to yield a similar result for the vector-valued function. 
However, there are times when such generalizations do not hold (see Exercise 13). The 
concept of a limit, though, can be extended naturally to vector-valued functions, as in the 
following definition. 
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Definition 1.11. Let f(t) be a vector-valued function, let a be a real number and let c be a 
vector. Then we say that the limit of f(t) as t approaches a equals c, written as limf(t) = c, 
if lim || f(t) — c || = 0. If f(t) = f 2 (t), f 3 (t)), then 

t—>a 

limf(t) = (lim f^t), lim f 2 (t), lim / , 3 (£)] 
t—*a \t—*a t—>a t—a / 

provided that all three limits on the right side exist. 


The above definition shows that continuity and the derivative of vector-valued functions 
can also be defined in terms of its component functions. 


Definition 1.12. Let f (t) = (f 1 (t),f 2 (t),f 3 (t)) be a vector-valued function, and let a be a real 

number in its domain. Then f (t) is continuous at a if limf(t) = f(a). Equivalently, fit) is 

i—>a 

continuous at a if and only if fi(t), and f 3 (t) are continuous at a. 

, df 

The derivative of f (t) at a, denoted by f (a) or —(a), is the limit 

dt 


i'{a ) = lim 
h^O 


f(a + /i)-f(a) 
h 


if that limit exists. Equivalently, f'(a) = (f 1 '(a),f 2 '(a),f 3 '(a)), if the component derivatives 
exist. We say that f(t) is differentiable at a if f'(a) exists. 


Recall that the derivative of a real-valued function of a single variable is a real number, 
representing the slope of the tangent line to the graph of the function at a point. Similarly, 
the derivative of a vector-valued function is a tangent vector to the curve in space which 
the function represents, and it lies on the tangent line to the curve (see Figure 1.8.2). 



Figure 1.8.2 Tangent vector f'(a) and tangent line L - f(a) + sf'(a) 


Example 1.36. Let f(t) = (cost,sint,t). Then i'(t) = (-sint,cost, 1) for all t. The tangent line 
L to the curve at f(27r) = (l,0,27r) is L = {(2tt) + sf'(27i) = (1,0,2^) + s(0,1,1), or in parametric 
form: x - 1, y = s, 2 = 2n + s for —oo < s < oo. 
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A scalar function is a real-valued function. Note that if u(t) is a scalar function and 
f(t) is a vector-valued function, then their product, defined by (uf)(t) = u(t)f(t) for all t, is a 
vector-valued function (since the product of a scalar with a vector is a vector). 

The basic properties of derivatives of vector-valued functions are summarized in the fol¬ 
lowing theorem. 


Theorem 1.20. Let f(t) and g(t) be differentiable vector-valued functions, let u(t) be a 
differentiable scalar function, let k be a scalar, and let c be a constant vector. Then 

(a) -j-(c) = 0 
at 

(b) ^m = k^ 

dt dt 

, , d (in. df 

(e) — (ufj= — f + u — 

dt dt dt 

_ d df dg 

<f, S (f - g)= *' g + f 'd7 

d df dg 

g dt (t * e = dt * g+l * dt 


Proof: The proofs of parts (a)-(e) follow easily by differentiating the component functions 
and using the rules for derivatives from single-variable calculus. We will prove part (f), 
and leave the proof of part (g) as an exercise for the reader. 

(f) Write f(£) = (fAt),f 2 (t),f 3 (t)) and g(t) = (gi(t),g 2 (t),g 3 (t)), where the component functions 
f 2 (t), f 3 (t), g\(t), g 2 (t), g 3 (t) are all differentiable real-valued functions. Then 


-y-(f (t) ■ g(t)) = ~~(fi(t)g 1 (t) + f 2 (t)g 2 (t ) + f 3 (t)g 3 (t )) 
dt dt 

= ^r(fi(t)gi(t))+ ^r(f 2 (t)g 2 (t))+ ^~{f 3 {t)g 3 {t)) 
dt dt dt 

= ^P~(t)gAt) +fi(t)^p-(t)+ ^p-(t)g 2 (t) +f 2 (t)^p-(t)+ ^p-(t)g 3 (t) +f 3 (t) < ^p-(t) 
dt dt dt dt dt dt 

= (^(*), ^dt^) ’ ( -Si( t ),g2(t),g a (t)) 

+ (fi(t),f 2 (t),f 3 (t))- l~^r(t), ^(f)) 


= 3 —(i) • g(t) + f(0 • ~r(t) for all t. 
dt dt 


QED 
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Example 1.37. Suppose f it) is differentiable. Find the derivative of ||f(t)||. 

Solution: Since ||f(t)|| is a real-valued function of t, then by the Chain Rule for real-valued 

functions, we know that -^-||f(t )|| I 2 = 2 ||f(t)|| -^-||f(t)||. 

at at 

But ||f(t )|| 2 = f(t)-f(t), so — ||f(t )|| 2 = —(f(t)-f(t)). Hence, we have 

at at 

2||f(t)|| -^-||f(t)|| = -^-(f(t)-f(t)) = f'(t)-f(t) + f(t)-f'(t) by Theorem 1.20(f), so 
at at 


>•= 


= 2 f'(t) • f(t) , so if || f(t) || ^ 0 then 


IfWII 


We know that ||f(t)|| is constant if and only if — ||f(t)|| = 0 for all t. Also, f (t) ± f '{t) if and 

at 

only if f 'it) • fit) = 0. Thus, the above example shows this important fact: 


If || fit) || ^ 0, then || f(t) || is constant if and only if f(t) _L f'(t) for all t. 


This means that if a curve lies completely on a sphere (or circle) centered at the origin, then 
the tangent vector f'it) is always perpendicular to the position vector fit). 


I cos t sint —at ^ 

Example 1.38. The spherical spiral fit) = — , ——— , — -_ I, for a ^ 0- 

Vl + a¥ vT+ a 2 t 2 Vl + a 2 t 2 

Figure 1.8.3 shows the graph of the curve when a = 0.2. In the exercises, the reader will be 
asked to show that this curve lies on the sphere x 2 + y 2 + z 2 = 1 and to verify directly that 
f'it) • fit) = 0 for all t. 



Figure 1.8.3 Spherical spiral with a - 0.2 


327 























Just as in single-variable calculus, higher-order derivatives of vector-valued functions are 
obtained by repeatedly differentiating the (first) derivative of the function: 


at 


at 


d n i 

dt n 


d i d n ~H \ 
dmdt n ~ l > 


(for n = 2,3,4,...) 


We can use vector-valued functions to represent physical quantities, such as velocity, ac¬ 
celeration, force, momentum, etc. For example, let the real variable t represent time elapsed 
from some initial time (t = 0), and suppose that an object of constant mass m is subjected 
to some force so that it moves in space, with its position ( x,y,z ) at time t a function of 
t. That is, x = x(t), y = y(t), z = z(t ) for some real-valued functions x(t), y(t), z(t). Call 
r(t) = (x(t),y(t),z(t)) the position vector of the object. We can define various physical quan¬ 
tities associated with the object as follows: 14 


position: r (t) = (x(t),y(t),z(t)) 

. dr 

velocity. v(f) = r (t) = r (t) = — 

dt 

= (x'(t),y'(t),z'(t)) 

. dx 

acceleration : a (t) = v(t) = v (t) = —— 

dt 

d^r 

- r(f) = r"tt) — -j-p 
momentum-. p(t) = mx(t) 

force : F(t) = p(t) = p'(t) = (Newton’s Second Law of Motion) 

dt 

The magnitude ||v(t)|| of the velocity vector is called the speed of the object. Note that since 
the mass m is a constant, the force equation becomes the familiar F(t) = ma(t). 


Example 1.39. Let r (t) = (5 cos t, 3 sin t, 4 sin t) be the position vector of an object at time t > 0. 
Find its (a) velocity and (b) acceleration vectors. 

Solution: (a) x(t) = r (t) = (-5sint,3cost, 4cost) 

(b) a (t) = v(t) = (-5cost,-3sint,-4sint) 

Note that ||r(t)|| = \/25cos^7-t-25sir?7 = 5 for all t, so by Example 1.37 we know that r(0* 
r(t) = 0 for all t (which we can verify from part (a)). In fact, || v(t)|| = 5 for all t also. And not 
only does r (t) lie on the sphere of radius 5 centered at the origin, but perhaps not so obvious 
is that it lies completely within a circle of radius 5 centered at the origin. Also, note that 
a(t) = -r(t). It turns out (see Exercise 16) that whenever an object moves in a circle with 
constant speed, the acceleration vector will point in the opposite direction of the position 
vector (i.e. towards the center of the circle). 


14 


We will often use the older dot notation for derivatives when physics is involved. 
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Recall from Section 1.5 that if r 1} r 2 are position vectors to distinct points then r, + fir 2 -r,) 
represents a line through those two points as t varies over all real numbers. That vector 
sum can be written as (1 - Dia + tr 2 . So the function \(t) = (1 - t)r, + tr 2 is a line through 
the terminal points of r 2 and r 2 , and when t is restricted to the interval [0, 1 ] it is the line 
segment between the points, with 1(0) = r, and 1(1) = r 2 . 

In general, a function of the form f it) = (a 1 t+b 1 ,a 2 t + b 2 ,a 3 t+b 3 ) represents a line in R 3 . A 
function of the form f (t) = (a^ 2 + b 3 t + c 1} a 2 t 2 + b 2 t + c 2 ,a 3 t 2 + b 3 t + c 3 ) represents a (possibly 
degenerate) parabola in R 3 . 


Example 1.40. Bezier curves are used in Computer Aided Design (CAD) to approximate 
the shape of a polygonal path in space (called the Bezier polygon or control polygon). For 
instance, given three points (or position vectors) b 0 , b 1; b 2 in R 3 , define 

bJ(D = (1 - Db 0 + tb x 
bj(t) = (l-t)b 1 + tb 2 
b 2 (f) = (l-i)bj(f) + rt>i(f) 

= (1 - t) 2 b 0 + 2t(l - Dbj + t 2 b 2 

for all real t. For t in the interval [0,1], we see that bj(f) is the line segment between b 0 and 
b 1; and b*(t) is the line segment between b 2 and b 2 . The function b„(D is the Bezier curve 
for the points b 0 , b,, b 2 . Note from the last formula that the curve is a parabola that goes 
through b 0 (when t = 0 ) and b 2 (when t = 1 ). 

As an example, let b 0 = (0,0,0), b, = (1,2,3), and b 2 = (4,5,2). Then the explicit formula for 
the Bezier curve is bg(t) = (2t + 2t 2 ,4t + t 2 ,6t- At 2 ), as shown in Figure 1.8.4, where the line 
segments are bj(t) and b|(C, and the curve is b^(t)- 


.(1,2,3) 



Figure 1.8.4 Bezier curve approximation for three points 
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In general, the polygonal path determined by n > 3 noncollinear points in K 3 can be used 
to define the Bezier curve recursively by a process called repeated linear interpolation. This 
curve will be a vector-valued function whose components are polynomials of degree n — 1, 
and its formula is given by de Casteljau’s algorithm. 15 In the exercises, the reader will be 
given the algorithm for the case of n = 4 points and asked to write the explicit formula for 
the Bezier curve for the four points shown in Figure 1.8.5. 



Figure 1.8.5 Bezier curve approximation for four points 


Exercises 

A 

For Exercises 1-4, calculate f '(t) and find the tangent line at f(0). 

1. ftf) = (f+l,f 2 + l,f 3 + l) 2. f(t) = (e t + l,e 2t + l,e t2 + 1) 

3. fit) = (cos2t,sin2t,t) 4. fit) = (sin2t,2sin 2 t,2cost) 


For Exercises 5-6, find the velocity v(t) and acceleration ait) of an object with the given 
position vector r it). 

5. r(t) = it,t- sin t,l- cost) 6. r(t) = (3cost,2sint, 1) 


B 


7. 


I cos t sint -at \ 

Let fit) = = , — = , 

\/l-i -a 2 t 2 Vl + a 2 t 2 Vl + a 2 t 2 


with ajt 0. 


(a) Show that ||f(t)|| = 1 for all t. 

(b) Show directly that f'it)-fit) = 0 for all t. 

8. If fit) = 0 for all t in some interval (a, h), show that f(t) is a constant vector in (a, b). 
15 See pp. 27-30 in Farin. 
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9. For a constant vector c ^ 0, the function fit) = £c represents a line parallel to c. 

(a) What kind of curve does git) = t 3 c represent? Explain. 

(b) What kind of curve does h(£) = e l c represent? Explain. 

(c) Compare f'(0) and g'(0). Given your answer to part (a), how do you explain the 
difference in the two derivatives? 


d 


df 


10. Show that — f x — = f x 
du dt 


dH 

dt 2 ' 


11. Let a particle of (constant) mass m have position vector r(t), velocity v(t), acceleration 
a(t) and momentum pit) at time t. The angular momentum L(£) of the particle with 
respect to the origin at time t is defined as L(t) = r(t) x p(t). If F(t) is the force acting on 
the particle at time t, then define the torque N(t) acting on the particle with respect to 
the origin as N(£) = r(t) x F(t). Show that L 'it) = N(t). 


12. Show that 





dt > 


13. The Mean Value Theorem does not hold for vector-valued functions: Show that for fit) = 
(cost,sin t,t), there is no £ in the interval (0,277-) such that 


f'(£) 


f(27T) - fiO) 
27T-0 


c 

14. The Bezier curve b„(£) for four noncollinear points b 0 , b 1( b 2 , b 3 in R 3 is defined by the 
following algorithm (going from the left column to the right): 

bj(£) = (1 - t)b 0 + tbi bo(t) = (1 - f)bj(£) + tbj(t) b lit) = (1 - t)bj(t) + tbf(t) 

b}(t) = (1 - t)b x + tb 2 bj(t) = (1 - t)bj(£) + th\(t) 

bjlt) = (1 - t)b 2 + tb 3 

(a) Show that b 3 (t) = (1 - t) 3 b 0 + 3t(l - t) 2 bj + 3t 2 (l - t)b 2 + t 3 b 3 . 

(b) Write the explicit formula (as in Example 1.40) for the Bezier curve for the points 
b 0 - (0,0,0), b, = (0,1,1), b 2 = (2,3,0), b 3 = (4,5,2). 

15. Let r(t) be the position vector for a particle moving in R 3 . Show that 

-^-(rx(v x r))= ||r|| 2 a + (r-v)v-(||v|| 2 + r-a)r. 


16. Let r(£) be the position vector in R 3 for a particle that moves with constant speed c > 0 
in a circle of radius a > 0 in the xy-plane. Show that a(t) points in the opposite direction 
as r(t) for all t. ( Hint: Use Example 1.37 to show that r(£) ± v(t) and a(t) ± v(t), and hence 
a (t) || r(t).) 


17. Prove Theorem 1.20(g). 
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1.9 Arc Length 


Let r (t) = (x(t),y(t),z(t)) be the position vector of an object moving in R 3 . Since || v(t)ll is the 
speed of the object at time t, it seems natural to define the distance s traveled by the object 
from time t = a to t = b as the definite integral 


s = 


[ b \\v(t)\\dt= C ^x'{t) 2 + y'{t) 2 + z'{t) 2 dt, 
Ja Ja 


(1.40) 


which is analogous to the case from single-variable calculus for parametric functions in R 2 . 
This is indeed how we will define the distance traveled and, in general, the arc length of a 
curve in R 3 . 


Definition 1.13. Let f(t) - (x(t),y(t),z(t)) be a curve in R 3 whose domain includes the inter¬ 
val [a,b]. Suppose that in the interval (a, b) the first derivative of each component function 
x(t), y(t) and zit) exists and is continuous, and that no section of the curve is repeated. Then 
the arc length L of the curve from t = a to t = b is 

fb r-b 

L= Hf'(t)lldt = 

Ja Ja 


yjx'(t) 2 + y'(t) 2 + z’(t) 2 dt 


(1.41) 


A real-valued function whose first derivative is continuous is called continuously differ¬ 
entiable (or a function), and a function whose derivatives of all orders are continuous 
is called smooth (or a function). All the functions we will consider will be smooth. A 
smooth curve f it) is one whose derivative f'(t) is never the zero vector and whose component 
functions are all smooth. 

Note that we did not prove that the formula in the above definition actually gives the 
length of a section of a curve. A rigorous proof requires dealing with some subtleties, nor¬ 
mally glossed over in calculus texts, which are beyond the scope of this book . 16 


Example 1.41. Find the length L of the helix li t) = (cos t, sin t, t) from t = 0 to t = 2n. 
Solution: By formula (1.41), we have 

r2n _ r- 2n . - r-2 n 

L= 1 \/ (-sint) 2 + (cost) 2 + 1 2 dt= / Vsin 2 1 + cos 2 1 + ldt = / V2 dt 

Jo Jo Jo 

= V2(2ji - 0) = 2V2n 


Similar to the case in R 2 , if there are values of t in the interval [a, b] where the derivative 
of a component function is not continuous then it is often possible to partition [a,b\ into 
subintervals where all the component functions are continuously differentiable (except at 
the endpoints, which can be ignored). The sum of the arc lengths over the subintervals will 
be the arc length over [a, b]. 

16 In particular, Duhamel’s principle is needed. See the proof in TAYLOR and Mann, § 14.2 and § 18.2. 
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Notice that the curve traced out by the function f(t) = (cost,sin t,t) from Example 1.41 is 
also traced out by the function git) = (cos2t,sin2t,2t). For example, over the interval [0, 7 r], 
g it) traces out the same section of the curve as fit) does over the interval [0,27 t]. Intuitively, 
this says that git) traces the curve twice as fast as f(t). This makes sense since, viewing the 
functions as position vectors and their derivatives as velocity vectors, the speeds of f it) and 
g(£) are ||f'(£)|| = V2 and ||g'(OII = 2\/2, respectively. We say that git) and fit) are different 
parametrizations of the same curve. 


Definition 1.14. Let C be a smooth curve in R 3 represented by a function fit) defined on an 
interval [a, b], and let a : [c,d] — [a, b] be a smooth one-to-one mapping of an interval [c,d) 
onto [a, b). Then the function g: [c,d] —• R 3 defined by g(s) = f(a(s)) is a parametrization of 
C with parameter s. If a is strictly increasing on [c,d) then we say that g(s) is equivalent 
to fit). 


s 

[C,d ] 



fit) 

R 3 


g(s) = f(a(s)) = fit) 


Note that the differentiability of g(s) follows from a version of the Chain Rule for vector¬ 
valued functions (the proof is left as an exercise): 


Theorem 1.21. Chain Rule: If fit) is a differentiable vector-valued function of t, and t = 
a(s) is a differentiable scalar function of s, then f(s) = f(a(s)) is a differentiable vector-valued 
function of s, and 


df df dt 
ds dt ds 

for any s where the composite function f(a(s)) is defined. 


(1.42) 


Example 1.42. The following are all equivalent parametrizations of the same curve: 

f(t) = (cost,sin t,t) for t in [0,27r] 
g(s) = (cos 2 s,sin 2 s, 2 s) for s in [ 0 ,zr] 
h(s) = (cos2^:s,sin2^:s,27rs) for s in [0,1] 

To see that g(s) is equivalent to fit), define a : [0,zr] —>• [0, 27 t] by a(s) = 2s. Then a is smooth, 
one-to-one, maps [0,zr] onto [0,27 t], and is strictly increasing (since a'is) = 2 > 0 for all s). 
Likewise, defining a : [0,1] — [0,2774 by a(s) = 2ns shows that h(s) is equivalent to f(i). 
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A curve can have many parametrizations, with different speeds, so which one is the best 
to use? In some situations the arc length parametrization can be useful. The idea behind 
this is to replace the parameter t, for any given smooth parametrization f (t) defined on [a, b], 
by the parameter s given by 


s = s(t )= 



f '(u)\\du. 


(1.43) 


In terms of motion along a curve, s is the distance traveled along the curve after time t 
has elapsed. So the new parameter will be distance instead of time. There is a natural 
correspondence between s and t: from a starting point on the curve, the distance traveled 
along the curve (in one direction) is uniquely determined by the amount of time elapsed, and 
vice versa. 

Since s is the arc length of the curve over the interval [a,£] for each t in la, 6], then it is a 
function of t. By the Fundamental Theorem of Calculus, its derivative is 


rl e rl 

s'(t)= — = — / ||f'(u)||d«= ||f'(£)|| for all fin [a, 6]. 
at at J a 

Since f(£) is smooth, then ||f'(£)ll > 0 for all t in [a, b]. Thus s'(t) > 0 and hence s(t) is strictly 
increasing on the interval [a, b]. Recall that this means that s is a one-to-one mapping of the 
interval [a, b] onto the interval [s(a),s(fe)]. But we see that 

J ra nb 

||f'(u)||da = 0 and s(b) = ||f'(zz)|| du = L = arc length from t = a to t = b 

a Ja 


So the function s : [a, b] — [0,L] is a one-to-one, differentiable 
mapping onto the interval [0,L]. From single-variable calculus, 
we know that this means that there exists an inverse function 
a : [0,L] — [a, 6] that is differentiable and the inverse of s : [a, 6] —• 
[0,L]. That is, for each t in [a, 6] there is a unique s in [0,L] such 
that s = s(t) and t = a(s). And we know that the derivative of a is 

a'(s) = 


a(s) 


= f'(a(s)) 


|f'(a(s))|| 
|f'(s)|| = 1 for all s in [0,L]. 


, so 



s(t) 

Figure 1.9.1 t — a(s) 


s'(a(s)) ||f'(a(s))|| 

So define the arc length parametrization f: [0,L] —► K :i by 

f(s) = f(u(s)) for all s in [0,L]. 

Then f(s) is smooth, by the Chain Rule. In fact, f(s) has unit speed : 

f'(s) = f , (a(s))a , (s) by the Chain Rule, so 

1 


So the arc length parametrization traverses the curve at a “normal” rate. 
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In practice, parametrizing a curve f (t) by arc length requires you to evaluate the integral 
s = f*\\f'(u)\\ du in some closed form (as a function of t) so that you could then solve for t in 
terms of s. If that can be done, you would then substitute the expression for t in terms of s 
(which we called a(s)) into the formula for f (t) to get f(s). 


Example 1.43. Parametrize the helix f(£) = (cos£,sint,£), for t in [0,27 t], by arc length. 
Solution: By Example 1.41 and formula (1.43), we have 


s= f || f '(u)\\du= f V2du = V2t for all t in [0,27 t]. 
Jo Jo 


So we can solve for t in terms of s: t = a(s) = ——. 

% 2 

.'. f(s) = [ cos —~,sin ——, ——) for all s in [0,2V2n]. Note that ||f'(s)|| = 1. 
1 v/2 s/2 s/2> 


Arc length plays an important role when discussing curvature and moving frame fields, 
in the field of mathematics known as differential geometry. 11 The methods involve using 
an arc length parametrization, which often leads to an integral that is either difficult or 
impossible to evaluate in a simple closed form. The simple integral in Example 1.43 is 
the exception, not the norm. In general, arc length parametrizations are more useful for 
theoretical purposes than for practical computations . 18 Curvature and moving frame fields 
can be defined without using arc length, which makes their computation much easier, and 
these definitions can be shown to be equivalent to those using arc length. We will leave this 
to the exercises. 

The arc length for curves given in other coordinate systems can also be calculated: 


Theorem 1.22. Suppose that r = r(t), 9 = 9(t ) and z = z(t) are the cylindrical coordinates of 
a curve f (t), for t in [a, b ]. Then the arc length L of the curve over [a, b ] is 

L= C \/r '{t) 2 + r(t) 2 9'(t ) 2 + 2 '{t) 2 dt (1.44) 

Ja 


Proof: The Cartesian coordinates (x(t),y{t),z{t)) of a point on the curve are given by 
x(t) = r(t) cos 6 (t), y(t) = r(t)sinO(t), z(t) = z(t) 
so differentiating the above expressions for x(t) and yit) with respect to t gives 

x'(t) = r'(t)cosO(t) - r(t)9'(t)sin6(t), y'(t) = r 1 (t)sin9(t)+r(t)9\t)cos9(t) 

17 See O’Neill for an introduction to elementary differential geometry. 

18 For example, the usual parametrizations of Bezier curves, which we discussed in Section 1.8, are polynomial 
functions in R 3 . This makes their computation relatively simple, which, in CAD, is desirable. But their arc 
length parametrizations are not only not polynomials, they are in fact usually impossible to calculate at all. 
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and so 


x'it) 2 + y'it) 2 = (r'(t)cos8(t)- r(t)0'(t)sinB(t)) 2 + ir'it) sin 8 it) + r(t)0'(t)cos8(t)) 2 
= r '(£) 2 (cos 2 8 + sin 2 8) + r(t) 2 d'(t) 2 ( cos 2 8 + sin 2 8) 

- 2r r (t)r(t)8'(t)cosd sind + 2r'it)rit)d'it) cos 8 sind 
= r'(t) 2 + r(t) 2 d'{t) 2 , and so 

L= C ^x'it) 2 + y'it) 2 + z'{t) 2 dt 
Ja 


= f Vr'(t) 2 + r(t) 2 8'(t) 2 + z'(t) 2 dt 

Ja 


QED 


Example 1.44. Find the arc length L of the curve whose cylindrical coordinates are r = e t , 
8 = t and z = e*, for t over the interval [ 0 , 1 ]. 

Solution: Since r'(t) - e*, 8'{t) = 1 and z'(t) = e then 


: f 1 \fr\t) 2 + r(t) 2 8 '{t) 2 + z '(t) 2 dt 

Jo 

f 


\/ e 2t + e 2t {l) + e 2t dt 


= f e t v / 3dt = \/3(e - 1) 

Jo 


Exercises 

A 

For Exercises 1-3, calculate the arc length of f (t) over the given interval. 

1. f (t) = (3cos2t,3sin2t,3t) on [0,7 t/2 ] 

2 . fit) = ((t 2 + l)cos t,(t 2 + l)sint, 2 \/ 2 t) on [ 0 , 1 ] 

3. fit) = (2cos3t,2sin3t,2t 3/2 ) on [0,1] 

4. Parametrize the curve from Exercise 1 by arc length. 

5. Parametrize the curve from Exercise 3 by arc length. 

B 

6 . Let fit) be a differentiable curve such that f(£) ^ 0 for all t. Show that 

d fit) \ _ fit) x if'it) x f(£)) 

dd ||f(£)|| J _ l|f(£)ll 3 ' 
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Exercises 7-9 develop the moving frame field T, N, B at a point on a curve. 


7. Let fit) be a smooth curve such that i'(t) ^ 0 for all t. Then we can define the unit tangent 
vector T by 

T«>= «'> 


Show that 


T '(f) = 


f'(f) II 

f'(f) x (f"(f) x f(t)) 

l|f'(t)ll 3 


8. Continuing Exercise 7, assume that f '(t) and f ”{t) are not parallel. Then T'(t) ^ 0 so we 
can define the unit principal normal vector N by 


N(f) = 


T'(f) 


Show that 


m) = 


I T'(f) || 
f'(f) x (f"(f) x f(t)) 


If'(f)II ||f"(t)xf'(t)|| 

9. Continuing Exercise 8, the unit binormal vector B is defined by 

B(t) = T(t) x N(f). 


Show that 


B(t) = 


f'(f) x fit) 

fwxrunr 


Note: The vectors T(t), N(t) and B(t) form a right-handed system of mutually perpendic¬ 
ular unit vectors (called orthonormal vectors) at each point on the curve fit). 


10. Continuing Exercise 9, the curvature k is defined by 


T'(t)H _ Hf , (t)x(f"(t)xf'(t))|| 
|f'(t)ll l|f'(t)|| 4 


Show that 

Ilf '(f) x fit)\\ 

K(t)= " and that T'(f) = ||f'(t)|| ic(f)N(f). 


Note: x(t) gives a sense of how “curved” the curve fit) is at each point. 


11. Find T, N, B and k at each point of the helix fit) = (cost, sin t,t). 

12. Show that the arc length L of a curve whose spherical coordinates are p = pit), 6 = Oit) 
and ip = (pit) for t in an interval [a, b ] is 

L = f b ^/p'it) 2 + (pit) 2 sin 2 (pit))B’it) 2 + p(t)V(t) 2 dt. 

Ja 
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4 Line and Surface Integrals 

4.1 Line Integrals 

In single-variable calculus you learned how to integrate a real-valued function fix) over an 
interval [a, 6] in R 1 . This integral (usually called a Riemann integral) can be thought of as 
an integral over a path in R 1 , since an interval (or collection of intervals) is really the only 
kind of “path” in R l . You may also recall that if fix) represented the force applied along the 
x-axis to an object at position x in [a,6], then the work W done in moving that object from 
position x — a to x = h was defined as the integral: 

W = f b fix) dx 
Ja 

In this section, we will see how to define the integral of a function (either real-valued or 
vector-valued) of two variables over a general path (i.e. a curve) in R 2 . This definition will 
be motivated by the physical notion of work. We will begin with real-valued functions of two 
variables. 

In physics, the intuitive idea of work is that 

Work = Force x Distance . 

Suppose that we want to find the total amount W of work done in moving an object along a 
curve C in R 2 with a smooth parametrization x = xit), y = y(/), a <t <6, with a force fix,y) 
which varies with the position (x,y) of the object and is applied in the direction of motion 
along C (see Figure 4.1.1 below). 



We will assume for now that the function fix,y) is continuous and real-valued, so we only 
consider the magnitude of the force. Partition the interval [a, b) as follows: 

a = t 0 < t x < t 2 < ••• < t n . x < t n = b , for some integer n > 2 
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As we cnn see from figure 4.1.1, over a typical subinterval U,the distance As, traveled 

along the curve is approximately y/Ax, 2 + Ay, 2 , by the Pythagorean Theorem. Thus, if the 
subinterval is small enough then the work done in moving the object along that piece of the 
curve is approximately 


Force x Distance s f(x imt y„) \JAx, 2 + Ay, 2 , 
where (.x,.,y,.) = (x(f f *),y(f,*)) for some t,* in [**,*,♦,], and so 

XV s ]T f( x ,.,y,.) 

i = 0 

is approximately the total amount of work done over the entire curve. But since 




where At, = - f,, then 


(4.1) 


(4.2) 


\V 


n-l 


i =0 



(4.3) 


% 

Taking the limit of that sum as the length of the largest subinterval goes to 0, the sum over 
all subintervals becomes the integral from t = a to t = 6, ^ and ^ become x\t) and y\t ), 
respectively, and f(x,.,y„) becomes f(x(t),y(t)), so that 


XV 



f(x{t),y(t))yjx'(t) 2 + y'(0 2 dt . 


(4.4) 


The integral on the right side of the above equation gives us our idea of how to define, 
for any real-valued function f(x t y), the integral of f(x,y) along the curve C, called a line 
integral : • 

Definition 4.1. For a real-valued function f(x,y) and a curve C in R 2 , parametrized by 
x = xit) y = y(f)» a<t<b , the line integral of f(x,y) along C with respect to arc length 

S is _ />b j - 

I f(x,y)ds = / /“(x(0,y(f)) v*'(0 2 + y'(<) 2 dt . (4.5) 

Jr Ja 

The symbol ds is the differential of the arc length function 

s = s(t) = f \Jx’(u) 2 + y'iu) 2 du , 

Ja 
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(4.6) 
















which you may recognize from Section 1.9 as the length of the curve C over the interval \a,t], 
for all t in [a, &]. That is, 

ds = s'(t)dt = \J'x'(t) 2 + y'(t) 2 dt , (4.7) 

by the Fundamental Theorem of Calculus. 

For a general real-valued function f(x,y), what does the line integral f c f(x,y)ds rep¬ 
resent? The preceding discussion of ds gives us a clue. You can think of differentials as 
infinitesimal lengths. So if you think of f(x,y) as the height of a picket fence along C, then 
f(x,y)ds can be thought of as approximately the area of a section of that fence over some 
infinitesimally small section of the curve, and thus the line integral f c f(x,y)ds is the total 
area of that picket fence (see Figure 4.1.2). 



Example 4.1. Use a line integral to show that the lateral surface area A of a right circular 
cylinder of radius r and height h is 2nrh. 

Solution: We will use the right circular cylinder with base circle C 
given by x 2 + y 2, = r 2 and with height h in the positive 2 direction 
(see Figure 4.1.3). Parametrize C as follows: 


x = x{t) = rcost, y = y(t) = rsin£, 0<t<2n 

Let f(x,y) = h for all (x,y). Then 

-6 


A = 


= J^f(x,y)ds = J f{x{t),y{t))\Jx'{t ) 2 + y'(t ) 2 dt 

hs/t^ sint ) 2 + (rcos £) 2 dt 
Jo 

[ 2 W, 

Jo 

l 


= h 
= rh 


sin 2 t + cos 2 t dt 


1 dt = 2nrh 



Sx C\x 2 + y 2 -r 2 

Figure 4.1.3 
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Note in Example 4.1 that if we had traversed the circle C twice, i.e. let t vary from 0 to 
An, then we would have gotten an area of 4 nrh, i.e. twice the desired area, even though the 
curve itself is still the same (namely, a circle of radius r). Also, notice that we traversed the 
circle in the counter-clockwise direction. If we had gone in the clockwise direction, using the 
parametrization 

x = x(t) = rcos(2n-t), y = y(t) = rsva(2n-t), Q<t<2n, (4.8) 

then it is easy to verify (see Exercise 12) that the value of the line integral is unchanged. 

In general, it can be shown (see Exercise 15) that reversing the direction in which a curve 
C is traversed leaves f c f(x,y)ds unchanged, for any f(x,y). If a curve C has a parametriza¬ 
tion x = x(t), y = y(t), a <t <b, then denote by -C the same curve as C but traversed in the 
opposite direction. Then -C is parametrized by 

x = x(a + b-t), y = y(a + b-t), a <t<b, (4.9) 


and we have 


f f(x,y)ds = f , 

Jc J-c 


f(x,y)ds . 


(4.10) 


Notice that our definition of the line integral was with respect to the arc length parameter 
s. We can also define 

f f(x,y)dx = f f(x(t),y(t))x'(t)dt (4.11) 

JC Ja 

as the line integral of f(x,y) along C with respect to x, and 

-b 


f f(x,y)dy = f f(x(t),y(t))y'(t)dt 
JC Ja 


(4.12) 


as the line integral of f(x,y) along C with respect to y. 

In the derivation of the formula for a line integral, we used the idea of work as force 
multiplied by distance. However, we know that force is actually a vector. So it would be 
helpful to develop a vector form for a line integral. For this, suppose that we have a function 
f(x,y) defined on [R 2 by 

f \x,y) = P(x,y) i + Q(x,y) j 

for some continuous real-valued functions P{x,y) and Q(x,y) on R 2 . Such a function f is 
called a vector field on R 2 . It is defined at points in R 2 , and its values are vectors in R 2 . For 
a curve C with a smooth parametrization x = x(t), y = y(t), a<t<b, let 


r(t) = x(t) i + y(t) j 
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be the position vector for a point (x(t),y(t)) on C. Then r'(t) = x'(t)i + y'(t)j and so 


f P(x,y)dx + f Q(x,y)dy = f P(x(t),y(t))x'(t)dt + f Q(x(t),y(t))y'(t)dt 
Jc Jc Ja Ja 

pb 

= (P(x(t),y(t))x'(t) + Q(x(t),y(t))y'(t))dt 

Ja 

pb 

= / f(x(t),y(t))-r'(t)dt 
Ja 


by definition of f (x,y). Notice that the function f(x(t),y(t))-r'(t) is a real-valued function on 
[a,b], so the last integral on the right looks somewhat similar to our earlier definition of a 
line integral. This leads us to the following definition: 


Definition 4.2. For a vector field f (x,y) = P(x,y)i + Q(x,y)j and a curve C with a smooth 
parametrization x = x(t), y = y{t), a<t<b, the line integral of f along C is 


f f-dr = f P(x,y)dx + f Q(x,y)dy 

Jc Jc Jc 


nb 

= / f(x(t),y(t))-r'(t)dt, 
Ja 


(4.13) 

(4.14) 


where r (t) = x(t )i + y(t )j is the position vector for points on C. 


We use the notation dr = r'{t)dt = dx i + dyj to denote the differential of the vector-valued 
function r. The line integral in Definition 4.2 is often called a line integral of a vector field 
to distinguish it from the line integral in Definition 4.1 which is called a line integral of a 
scalar field. For convenience we will often write 


I P{x,y)dx + I Q(x,y)dy = I P(x,y)dx + Q(x,y)dy , 

Jc Jc Jc 

where it is understood that the line integral along C is being applied to both P and Q . The 
quantity P(x,y)dx + Q(x,y)dy is known as a differential form. For a real-valued function 
F(x,y), the differential of F is dF = dx+ dy. A differential form P(x,y)dx + Q(x,y)dy 
is called exact if it equals dF for some function F(x,y). 

Recall that if the points on a curve C have position vector r(i) = x(t)i + y(t)j, then r'(t) is a 
tangent vector to C at the point (x(t),y(t)) in the direction of increasing t (which we call the 
direction ofC). Since C is a smooth curve, then r'(t) f 0 on [a, b] and hence 


T(t) = 


r'(t) 

r'(f)|| 


is the unit tangent vector to C at ( x(t),y(t )). Putting Definitions 4.1 and 4.2 together we get 
the following theorem: 
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Theorem 4.1. For a vector field f (x,y) = P(x,y)\ + Q(x,y)] and a curve C with a smooth 
parametrization x = x(t), y = y(t), a<t<b and position vector r(t) = ;e(t)i + y(t)j, 

f f dr = f f-T ds , (4.15) 

Jc Jc 

where T(t) = is the unit tangent vector to C at (x(t),y(t)). 

If the vector field f(x,y ) represents the force moving an object along a curve C, then the work 
W done by this force is 

W = f f-T ds = f f-dr. (4.16) 

Jc Jc 


Example 4.2. Evaluate f c (x 2 + y 2 )dx + 2 xydy, where: 

(a) C:x = t, y = 2t, 0<t<l 

(b) C : x = t , y = 2 1 2 , 0 < t < 1 

Solution: Figure 4.1.4 shows both curves. 

(a) Since x'(t) = 1 and y'(t) = 2, then 

J^(x 2 + y 2 )dx + 2xydy = {(x(t) 2 + y(t) 2 )x'(t) + 2x(t)y(t)y'(t)) 

= f {(t 2 + 4t 2 )(l) + 2t(2t)(2))dt 

Jo 


dt 


f 


13 1 2 dt 
o 

13t 3 1 


13 


o 


(b) Since x'(t) = 1 and y'(t) = 4 1, then 

J^(x 2 +y 2 )dx + 2xydy = ((x(t) 2 + y(t) 2 )x'(t) + 2x(t)y(t)y'(t)} 

= [ 1 ((t 2 + 4t 4 )(l) + 2t(2t 2 )(4t))dt 

Jo 

= f (t 2 + 20t 4 )dt 

Jo 


dt 


= —+ 4 r 
3 


1 , 13 

= - +4 = — 
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So in both cases, if the vector field f (x,y) = (x 2 + y 2 )i + 2xyj represents the force moving an 
object from (0,0) to (1,2) along the given curve C, then the work done is -y. This may lead 
you to think that work (and more generally, the line integral of a vector field) is independent 
of the path taken. However, as we will see in the next section, this is not always the case. 


Although we defined line integrals over a single smooth curve, if C is a piecewise smooth 
curve, that is 

C = C 1 u C 2 u... u C n 


is the union of smooth curves C u ... ,C n , then we can define 


f f-dr = f f-dr 1 + f f-dr 2 +...+ f 

Jc Jc i Jc 2 Jc, 


f-dr„ 


1C JC i JC 2 

where each r, is the position vector of the curve C, . 


Example 4.3. Evaluate f c (x 2 + y 2 )dx + 2xydy, where C is the polygonal path from (0,0) to 
(0,2) to (1,2). 

Solution: Write C = C i u C 2 , where C, is the curve given by x = 0 , y = t, 

0 < t < 2 and C 2 is the curve given by x = t, y = 2, 0<t<l (see Figure 
4.1.5). Then 


/ (x 2 + y 2 )dx + 2xydy = / (x 2 + y 2 )dx + 2xydy 
Jc dc, 


Cl 

/ (x 2 +y 2 )dx + 2xydy 
JC 2 




2 

—U,2) 




, c i 


X 

0 

1 


Figure 4.1.5 


f 2 f 1 

/ ((0 2 +1 2 )(0) + 2(0)t(l)) rft + / [{t 2 + 4)(1) + 2t(2)(0)j dt 
Jo Jo 

f 2 0dt+ 

Jo Jo 


(r + 4 )dt 


~3 +At 


1 , 13 
= - + 4 = — 


Line integral notation varies quite a bit. For example, in physics it is common to see the 
notation / Q 6 f • c/1, where it is understood that the limits of integration a and h are for the 
underlying parameter t of the curve, and the letter 1 signifies length. Also, the formulation 
f c t-Tds from Theorem 4.1 is often preferred in physics since it emphasizes the idea of 
integrating the tangential component f-T of f in the direction of T (i.e. in the direction of C), 
which is a useful physical interpretation of line integrals. 
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Exercises 


A 

For Exercises 1-4, calculate f c f(x,y)ds for the given function f(x,y) and curve C. 

1. f(x,y) = xy; C : x = cost, y = sin t, 0 < t < n/2 

2. f(x,y) = X ; C :x = t, y = 0,0<t<l 

x A +1 

3. f(x,y) = 2x + y; C: polygonal path from (0,0) to (3,0) to (3,2) 

4. f(x,y) = x + y 2 ; C: path from (2,0) counterclockwise along the circle x 2 + y 2 = 4 to the 
point (-2,0) and then back to (2,0) along the x-axis 

5. Use a line integral to find the lateral surface area of the part of the cylinder 
x 2 + y 2 = 4 below the plane x + 2y + z = 6 and above the xy-plane. 

For Exercises 6-11, calculate f c f-dr for the given vector field f (x,y) and curve C. 

6. f (x,y) = i - j; C : x = 3t, y - 2t, 0 < t < 1 

7. f(x,y) = yi-xj; C : x = cost, y - sint, 0 < t < 2n 

8. f(x,y) = xi + yj; C : x = cost, y = sint, 0 < t < 2n 

9. f(x,y) = (x 2 -y)i + (x-y 2 )j; C : x = cost, y = sin t, 0 < t < 2n 

10. f(x,y) = xy 2 i + xy 3 j; C : the polygonal path from (0,0) to (1,0) to (0,1) to (0,0) 

11. f(x,y) = (x 2 +y 2 )i; C : x = 2 + cost, y = sint, 0 < t < 2 tt 

B 

12. Verify that the value of the line integral in Example 4.1 is unchanged when using the 
parametrization of the circle C given in formulas (4.8). 

13. Show that if f _L r'(t) at each point r(t) along a smooth curve C, then f c f • dr = 0. 

14. Show that if f points in the same direction as r'(t) at each point r (t) along a smooth 
curve C, then f c f-dr = / c ||f||ds. 

c 

15. Prove that f c f(x,y)ds = f_ c f(x,y)ds. (Hint: Use formulas (4.9).) 

16. Let C be a smooth curve with arc length L, and suppose that f(x,y) = P(x,y)i + Q(x,y)j 
is a vector field such that ||f(x,y)|| < M for all (x,y) on C. Show that 

|/ c f-rfr| <ML. (Hint: Recall that g(x)dx\ < f£\g(x)\dx for Riemann integrals.) 

17. Prove that the Riemann integral f(x)dx is a special case of a line integral. 
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4.2 Properties of Line Integrals 


We know from the previous section that for line integrals of real-valued functions (scalar 
fields), reversing the direction in which the integral is taken along a curve does not change 
the value of the line integral: 


/ f(x,y)ds= / f(x,y)ds 
Jc J-C 


(4.17) 


For line integrals of vector fields, however, the value does change. To see this, let f {x,y) = 
P(x,y)i + Q(x,y)j be a vector field, with P and Q continuously differentiable functions. Let 
C be a smooth curve parametrized by x = x(t), y = y(t), a <t < b, with position vector r(t) = 
x{t) i + y(t) j (we will usually abbreviate this by saying that C : r (t) = x(t) i + y(t) j is a smooth 
curve). We know that the curve -C traversed in the opposite direction is parametrized by 
x = x(a + b -1), y = y(a + b-t),a<t<b. Then 


f P(x,y)dx = f 
J-C Ja 


d 

P(x(a + b - t),y(a + b- t )) — (x(a + b- t))dt 

dt 


L 


r b 

= I P(x(a + b - t),y(a + b - t))(-x'(a + b - t))dt (by the Chain Rule) 
Ja 

= / P(x(u),y(u))(-x'(u))(-du) (by letting u = a + b - 1) 

Jb 

r a 

= I P(x(u),y(u))x'(u)du 
Jb 

J nb ra rb 

P(x(u),y(u))x'(u)du , since I =-/ , so 

a Jb Ja 

fc 


P(x,y)dx = - | P(x,y)dx 


since we are just using a different letter (u) for the line integral along C. A similar argument 
shows that 

/ Q(x,y)dy = - / Q(x,y)dy , 

J-C Jc 


and hence 


/ f-dr = / P(x,y)dx + 
J-C J-C J- 


P(x,y)dx+ / Q(x,y)dy 
-C J-C 

- f P(x,y)dx + 


-~Jc 


/ Q(x,y)dy 

Jc 

/ P(x,y)dx+ / Q(x,y)dy 
Jc Jc 


f f-dr = - f f-dr . 

J-C Jc 


(4.18) 


346 



The above formula can be interpreted in terms of the work done by a force f (x,y) (treated 
as a vector) moving an object along a curve C: the total work performed moving the object 
along C from its initial point to its terminal point, and then back to the initial point moving 
backwards along the same path, is zero. This is because when force is considered as a vector, 
direction is accounted for. 

The preceding discussion shows the importance of always taking the direction of the curve 
into account when using line integrals of vector fields. For this reason, the curves in line 
integrals are sometimes referred to as directed curves or oriented curves. 

Recall that our definition of a line integral required that we have a parametrization x = 
x(t), y = y(t), a < t < b for the curve C. But as we know, any curve has infinitely many 
parametrizations. So could we get a different value for a line integral using some other 
parametrization of C, say, x = x{u), y = yiu), c < u < d ? If so, this would mean that our 
definition is not well-defined. Luckily, it turns out that the value of a line integral of a 
vector field is unchanged as long as the direction of the curve C is preserved by whatever 
parametrization is chosen: 


Theorem 4.2. Let f ix,y) = P(x,y)i + Q(x,y)j be a vector field, and let C be a smooth curve 
parametrized by x = xit), y = yit), a < t < b. Suppose that t = a(u) for c < u < d, such that 
a = a(c), b = aid), and a'iu) > 0 on the open interval ( c,d) (i.e. a(u) is strictly increasing on 
[c,d]). Then f c f-dr has the same value for the parametrizations x = xit), y = y{t), a <t <b 
and x = x(u) = x(a(u)), y = yiu) = y(a(u)), c <u<d. 


Proof: Since aiu) is strictly increasing and maps [c,d] onto [a, b], then we know that t = 
a(u) has an inverse function u = a~ l {t) defined on [a,6] such that c = a _1 (a), d = a _1 (6), 
and = jTT^j. Also, dt = a'(u)du, and by the Chain Rule 


, dx d dx dt 

x (u) = — = —( x(a(u ))) =-= x {t)a'iu) 

du du dt du 

so making the susbstitution t = aiu) gives 


x'it) 


x'iu) 

a'iu) 


f 


Pixit),yit))x\t)dt 


-L 


a x'iu) 

Pixiaiu)),yiaiu))) ——— ia'iu)du) 
a-Ha) a'iu) 

r d 

I Pixiu),yiu))x'iu)du , 


which shows that f c Pix,y)dx has the same value for both parametrizations. A similar 
argument shows that f c Qix,y)dy has the same value for both parametrizations, and hence 
/ c f -dr has the same value. QED 


Notice that the condition a'iu) > 0 in Theorem 4.2 means that the two parametrizations 
move along C in the same direction. That was not the case with the “reverse” parametriza¬ 
tion for -C: for u = a + b -t we have t = aiu) = a + b - u => a'iu) = -1 < 0. 
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Example 4.4. Evaluate the line integral f c (x 2 + y 2 )dx + 2xy dy from Example 4.2, Section 
4.1, along the curve C :x = t, y = 2 t 2 , 0<t<l, where t = sinu for 0 <u < n/2. 

Solution: First, we notice that 0 = sinO, 1 = sin(7r/2), and ^ = cosu > 0 on (0,7i/2). So by 
Theorem 4.2 we know that if C is parametrized by 

o 

x = sin u , y = 2 sin u , 0 < u < n/2 

then f c (x 2 + y 2 )dx + 2xydy should have the same value as we found in Example 4.2, namely 
tP. And we can indeed verify this: 

r pn/2 

I O O If O o o 9 \ 

1 (x +y )dx + 2xydy = / (sin u + (2sin u) )cosu+ 2(sinu)(2sin u)4sinucosu ) du 

Jc Jo 

r*Jl/2 


rn/Z 

/ (sin 2 u +20 sin 4 u) cos udu 
Jo 


sin 3 u r 

-l-4sm u 


_ 1 .13 

—-h 4 — - 

3 3 


n/2 

0 


In other words, the line integral is unchanged whether t or u is the parameter for C. 


By a closed curve, we mean a curve C whose initial point and terminal point are the 
same, i.e. for C : x = x(t), y = y(t), a < t < b, we have (x(a),y(a)) = (x(b),y(b)). 




(a) Closed (b) Not closed 

Figure 4.2.1 Closed vs nonclosed curves 


A simple closed curve is a closed curve which does not intersect itself. Note that any 
closed curve can be regarded as a union of simple closed curves (think of the loops in a figure 
eight). We use the special notation 

f(x,y)ds and f f-dr 

Jc Jc 


to denote line integrals of scalar and vector fields, respectively, along closed curves. In some 
older texts you may see the notation (j/i or (j) to indicate a line integral traversing a closed 
curve in a counterclockwise or clockwise direction, respectively. 
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So far, the examples we have seen of line integrals (e.g. Example 4.2) have had the same 
value for different curves joining the initial point to the terminal point. That is, the line 
integral has been independent of the path joining the two points. As we mentioned before, 
this is not always the case. The following theorem gives a necessary and sufficient condition 
for this path independence : 


Theorem 4.3. In a region R, the line integral f c t-dr is independent of the path between 
any two points in R if and only if f • dr = 0 for every closed curve C which is contained in 


R. 


Proof: Suppose that § c t-dr = 0 for every closed curve C which is contained in R. Let P x 
and P 2 be two distinct points in R. Let C i be a curve in R going from P, to P 2 , and let C 2 
be another curve in R going from P 1 to P 2 , as in Figure 4.2.2. 

Then C = C 1 u -C 2 is a closed curve in R (from P 1 to C 1 



f Ci f • dr = f C2 f • dr. This proves path independence. 

Conversely, suppose that the line integral f c f-dr is independent of the path between any 
two points in R. Let C be a closed curve contained in R. Let P 1 and P 2 be two distinct points 
on C. Let C 1 be a part of the curve C that goes from P, to P 2 , and let C 2 be the remaining 
part of C that goes from P 1 to P 2 , again as in Figure 4.2.2. Then by path independence we 
have 



dr = 

dr = 

dr = 

dr = 


f • dr 

c 2 


0 


0 , so 

0 


since C = C 1 u -C 2 . 


QED 
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Clearly, the above theorem does not give a practical way to determine path independence, 
since it is impossible to check the line integrals around all possible closed curves in a region. 
What it mostly does is give an idea of the way in which line integrals behave, and how seem¬ 
ingly unrelated line integrals can be related (in this case, a specific line integral between 
two points and all line integrals around closed curves). 

For a more practical method for determining path independence, we first need a version 
of the Chain Rule for multivariable functions: 


Theorem 4.4. (Chain Rule) If 2 = f(x,y) is a continuously differentiable function of x and 
y, and both x = x(t) and y = y(t) are differentiable functions of t, then 2 is a differentiable 
function of t, and 

dz dz dx dz dy , 

_ =_+_(4.19) 

dt dx dt dy dt 

at all points where the derivatives on the right are defined. 


The proof is virtually identical to the proof of Theorem 2.2 from Section 2.4 (which uses the 
Mean Value Theorem), so we omit it. 1 We will now use this Chain Rule to prove the following 
sufficient condition for path independence of line integrals: 


Theorem 4.5. Let f (x,y) = P(x,y)i + Q(x,y) j be a vector field in some region R, with P and 
Q continuously differentiable functions on R. Let C be a smooth curve in R parametrized 
by x = x{t), y = y(t), a<t<b. Suppose that there is a real-valued function F(x,y) such that 
S/F = f on R. Then 

[ f dr = F(B) - F(A) , (4.20) 

Jc 

where A = 0c(a),y(a)) and B = (x(b),y(b)) are the endpoints of C. Thus, the line integral is 
independent of the path between its endpoints, since it depends only on the values of F at 
those endpoints. 


Proof: By definition of f c f- dr, we have 
b 


f f-dr = f (P(x(t),y(t))x'(t) + Q(x(t),y(t))y'(t)] dt 

J C J CL 

C b ldF dx dF dy\ ... „ 

= /- 1 - \ dt (since VF = f 

J a l dx dt dy dt) 


dF dF 

= f => — = P and — = Q) 
dx dy 


rb 

/ F’ 
Ja 


(x(t),y(t))dt (by the Chain Rule in Theorem 4.4) 


- F(x(t),y(t)) 


= F(B) - F(A ) 


by the Fundamental Theorem of Calculus. 


QED 


1 See Taylor and Mann, §6.5. 
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Theorem 4.5 can be thought of as the line integral version of the Fundamental Theorem 
of Calculus. A real-valued function F(x,y ) such that VF(x,y) = f (x,y) is called a potential 
for f. A conservative vector field is one which has a potential. 


Example 4.5. Recall from Examples 4.2 and 4.3 in Section 4.1 that the line integral Jq{x 2 + 
y 2 )dx + 2xy dy was found to have the value -y for three different curves C going from the 
point (0,0) to the point (1,2). Use Theorem 4.5 to show that this line integral is indeed path 
independent. 

Solution: We need to find a real-valued function F(x,y) such that 


dF 2 2 j 

— = x + y and 
dx 



Suppose that % = x 2 + y 2 , Then we must have F(x,y) = n-x 3 + xy 2 + g(y) for some function 
g(y). So = 2 xy + g'{y) satisfies the condition = 2 xy if g'(y) = 0, i.e. g(y) = K, where K 
is a constant. Since any choice for K will do (why?), we pick K- 0. Thus, a potential F(x,y ) 
for f (x,y) = (x 2 + y 2 )i + 2xyj exists, namely 

F(x,y) = ^x 3 + xy 2 . 

Hence the line integral f c (x 2 + y 2 )dx + 2xydy is path independent. 

Note that we can also verify that the value of the line integral of f along any curve C going 
from (0,0) to (1,2) will always be ^y, since by Theorem 4.5 

f f dr = 27(1,2) - 27(0,0) - i(l) 3 + (l)(2) 2 -(0 + 0) = ^ + 4 = ^ . 

Jc 3 3 3 


A consequence of Theorem 4.5 in the special case where C is a closed curve, so that the 
endpoints A and B are the same point, is the following important corollary: 


Corollary 4.6. If a vector field f has a potential in a region R , then <p f-dr = 0 for any closed 

JC 

curve C in R (i.e. <p VF -dr = 0 for any real-valued function F(x,y)). 

Jc 


Example 4.6. Evaluate xdx + ydy for C : x = 2cost, y = 3sint, 0 < t < 2n. 

Solution: The vector field f(x,y) = xi + yj has a potential F(x,y): 


dF 


^~ =x 
ox 

dF , 

-r-=y => g(y) = y 

dy 


F(x,y) = -x 2 +g(y), so 


g(y) = 7: y +K 
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for any constant K, so F(x,y ) = - x 2 + -y 2 is a potential for f(x,y). Thus, 


<p xdx + ydy = <p f-dr = 0 

Jc Jc 


2 2 

by Corollary 4.6, since the curve C is closed (it is the ellipse = 1). 


Exercises 

Evaluate ^(x 2 + y 2 )dx + 2xydy forC:x = cost, y = sini, 0 < i < 2jr. 

2. Evaluate / (x 2 + y 2 )dx + 2xydy for C : x - cost, y = sin t, 0 < t < n. 

Jc 

3. Is there a potential F(x,y) for f(x,y) = yi-xj? If so, find one. 

4. Is there a potential F(x,y) for f(x,y) = xi - yj? If so, find one. 

5. Is there a potential F(x,y) for f(x,y) = xy 2 i + x 3 yj? If so, find one. 

B 

6. Let f(x,y) and g(x,y) be vector fields, let a and b be constants, and let C be a curve in IR 2 . 
Show that 

/ (af±bg)-dr = a f-dr ± b g-dr. 

Jc Jc Jc 

7. Let C be a curve whose arc length is L. Show that f c Ids = L. 

8. Let f(x,y) and g(x,y) be continuously differentiable real-valued functions in a region R. 
Show that 

<p fS/g-dr = - ® g^Jf-dr 

Jc Jc 

for any closed curve C in R. (Hint: Use Exercise 21 in Section 2.4.) 

9. Let f(x,y) = i+ pfpj for all (x,y) ^ (0,0), and C : x = cost, y = sint, 0 < t < 2n. 

(a) Show that f = S/F, for F(x,y) = tan _1 (y/x). 

(b) Show that <p f • dr = 2 n. Does this contradict Corollary 4.6? Explain. 

Jc 

c 


10. Let g(x) and h(y) be differentiable functions, and let f(x,y) = h(y)i + g(x)j. Can f have a 
potential F(x,y)7 If so, find it. You may assume that F would be smooth. (Hint: Consider 
the mixed partial derivatives ofF.) 
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4.3 Green’s Theorem 


We will now see a way of evaluating the line integral of a smooth vector field around a 
simple closed curve. A vector field f (x,y) = P(x,y)i + Q(x,y)j is smooth if its component 
functions P(x,y) and Q(x,y) are smooth. We will use Green’s Theorem (sometimes called 
Green’s Theorem in the plane) to relate the line integral around a closed curve with a double 
integral over the region inside the curve: 

Theorem 4.7. (Green’s Theorem) Let R be a region in R 2 whose boundary is a simple 
closed curve C which is piecewise smooth. Let f (x,y) = P{x,y)\ + Q{x,y)i be a smooth vector 
field defined on both R and C. Then 

i t ' dr = <4 - 21) 

R 

where C is traversed so that R is always on the left side of C. 


Proof: We will prove the theorem in the case for a simple region R, that is, where the 
boundary curve C can be written as C = C 1 u C 2 in two distinct ways: 

C 1 = the curve y = ydx) from the point X 1 to the point X 2 (4.22) 

C 2 = the curve y = y 2 (x) from the point X 2 to the point X lt (4.23) 


where X 1 and X 2 are the points on C farthest to the left and right, respectively; and 

Ci = the curve x = x^y) from the point Y 2 to the point Y 1 (4.24) 

C 2 = the curve x = x 2 (y) from the point Y 1 to the point Y 2 , (4.25) 

where Yi and Y 2 are the lowest and highest points, respectively, on C. See Figure 4.3.1. 



Figure 4.3.1 


Integrate P(x,y) around C using the representation C = C 1 u C 2 given by (4.23) and (4.24). 
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Since y = y x (x) along C x (as x goes from a to b) and y = y 2 (x) along C 2 (as x goes from b to 
a), as we see from Figure 4.3.1, then we have 


j>^P(x,y)dx 


P(x,y)dx + / P(x,y)dx 

c 2 


/ P(x,y)dx + 

Jc 1 Jc, 

f 

Ja 

f 

Ja 


P(x,yi(x))c?x + / P(x,y 2 (x))dx 


P(x, yi(x))dx 


/; 

r 


P(x,y 2 (x))dx 


- / (P(x,y 2 (x)) - P(x, ydx))) dx 


/ 

■/' 


y=y2fe) 

y=yife) 


P(x,y) 

b rya(x) dP(x,y) 


dx 


■fl 

Ja Jy 


fyi(x) dy 


dydx (by the Fundamental Theorem of Calculus) 


R 


Likewise, integrate Q(x,y) around C using the representation C = u C 2 given by (4.25) 
and (4.26). Since x = x t (y) along C x (as y goes from d to c) and x = x 2 (y) along C 2 (as y goes 
from c to d), as we see from Figure 4.3.1, then we have 


i Q(x,y)dy = / l 

Jc Jc i 


L; 


y)dy = / Q(x,y)dy + / Q(x,y)dy 


(. y),y)dy + / Q(x 2 (y),y)Gfy 


= j Q(x 1 (y),y)dy + J 
= -/. 


rd 

-J Q(x 1 (y),y)dy + J Q(x 2 (y),y)dy 

c d 

j (Q(x 2 (y),y) - Q(x!(y),y))dy 


I ( <?fa ' 5 ' ) 

/7 


x=x 2 (yV 
x=x 1 (y) l 

' d f X2(y) dQ(x,y ) 

/c Jxi(y) dx 


dy 


dxdy (by the Fundamental Theorem of Calculus) 


rr dQ 

/ / — dA , and so 
JJ ox 


R 
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j) f-dr 


<p P(x,y)dx + f Q(x,y)dy 
JC Jc 


)C JC 

dP 
dy 


^ dA+ II ^dA 


II^II 


dx 


R 


R 



dA . 


I QED 

Though we proved Green’s Theorem only for a simple region R , the theorem can also be 
proved for more general regions (say, a union of simple regions). 2 


Example 4.7. Evaluate § c {x 2 + y 2 )dx + 2xy dy, where C is the boundary (traversed counter¬ 
clockwise) of the region R = {(x,y): 0 < x < 1, 2x 2 < y < 2x}. 

Solution: R is the shaded region in Figure 4.3.2. By Green’s Theorem, for 
P(x,y) = x 2 + y 2 and Q(x,y) = 2xy, we have 

<dQ dP’ 
dx dy , 


j) (x 2 + y 2 )dx + 2xydy = 


-II 

R 

-II 


dA 


0 dA = 0 . 


1 

2 

( 1 , 2 ) 

/ 


/ 

_ 4 

(J x 

0 

1 


(2y-2y)dA = JJ ( 

R R Figure 4.3.2 

We actually already knew that the answer was zero. Recall from Example 4.5 in Section 
4.2 that the vector field f(x,y) = (x 2 + y 2 )i + 2xyj has a potential function F(x,y) = ^x 3 +xy 2 , 
and so § c f • dr = 0 by Corollary 4.6. 


Example 4.8. Let f (x,y) = P(x,y)i + Q(x,y)j, where 

-y 


P(x,y) = 


x 2 + y 2 


and Q(x,y) = 


x 2 + y 2 


and let R = {(x, y ): 0 < x 2 + y 2 < 1}. For the boundary curve C :x 2 + y 2 = 1, traversed counter¬ 
clockwise, it was shown in Exercise 9(b) in Section 4.2 that $ c f-dr = 2 it. But 


dQ 

dx 


y 2 -x 2 
(x 2 + y 2 ) 2 


dP 

dy 


II 


dQ 

dx 


dP) 

dy 


II 


dA = 11 OdA = 0 . 


R R 

This would seem to contradict Green’s Theorem. However, note that R is not the entire 
region enclosed by C, since the point (0,0) is not contained in R. That is, R has a “hole” at 
the origin, so Green’s Theorem does not apply. 

2 See TAYLOR and Mann, § 15.31 for a discussion of some of the difficulties involved when the boundary curve 
is “complicated”. 


355 



If we modify the region R to be the annulus R = 
{( x,y ): 1/4 < x 2 + y 2 < 1} (see Figure 4.3.3), and take 
the “boundary” C of R to be C = C 1 u C 2 , where C i is 
the unit circle x 2 + y 2 = 1 traversed counterclockwise 
and C 2 is the circle x 2 + y 2 = 1/4 traversed clockwise, 
then it can be shown (see Exercise 8) that 


j) f-dr = 0 . 

We would still have ff dA = 0, so for this R 

R 

we would have 


j> f-dr 



dP 

dy 


dA , 



1 


x 


Figure 4.3.3 The annulus R 


which shows that Green’s Theorem holds for the annular region R. 


It turns out that Green’s Theorem can be extended to multiply connected regions, that is, 
regions like the annulus in Example 4.8, which have one or more regions cut out from the 
interior, as opposed to discrete points being cut out. For such regions, the “outer” boundary 
and the “inner” boundaries are traversed so that R is always on the left side. 



(a) Region R with one hole 



(b) Region R with two holes 


Figure 4.3.4 Multiply connected regions 


The intuitive idea for why Green’s Theorem holds for multiply connected regions is shown 
in Figure 4.3.4 above. The idea is to cut “slits” between the boundaries of a multiply con¬ 
nected region R so that R is divided into subregions which do not have any “holes”. For 
example, in Figure 4.3.4(a) the region R is the union of the regions R , and R 2 , which are 
divided by the slits indicated by the dashed lines. Those slits are part of the boundary of 
both R 1 and R>, and we traverse then in the manner indicated by the arrows. Notice that 
along each slit the boundary of R, is traversed in the opposite direction as that of R 2 , which 
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means that the line integrals of f along those slits cancel each other out. Since R 1 and R 2 do 
not have holes in them, then Green’s Theorem holds in each subregion, so that 



f-dr 



and 


ft* t dr = 

oiR 2 



dA . 


But since the line integrals along the slits cancel out, we have 


and so 


<f f-dr 

JCiuC 2 


fbdy f dr + fbdy f - dV ’ 
offli of R 2 


(f f-dr 

JC 1 uC 2 




dA , 


which shows that Green’s Theorem holds in the region R. A similar argument shows that 
the theorem holds in the region with two holes shown in Figure 4.3.4(b). 

We know from Corollary 4.6 that when a smooth vector field f (x,y) = P(x,y) i + Q(x,y) j on 
a region R (whose boundary is a piecewise smooth, simple closed curve C) has a potential in 
R, then <f c f-dr = 0. And if the potential F(x,y) is smooth in R, then = P and | j = Q, and 
so we know that 

d 2 F d 2 F dP dQ 

-=- => — = — in R. 

dydx dxdy dy dx 

Conversely, if ^ in R then 

£ f - dr = //(S-|) dA = // 0dA = °- 

R R 


For a simply connected region R (i.e. a region with no holes), the following can be shown: 


The following statements are equivalent for a simply connected region R in R 2 : 

(a) f(x,y) = P{x,y)\ + Q(,x,y)$ has a smooth potential F(x,y) in R 

(b) / f • dr is independent of the path for any curve C in R 
JC 


(c) 

(d) 


<p f-dr = 0 for every simple closed curve C in R 
JC 


dP dQ . 


dy dx 


in R (in this case, the differential form Pdx + Q dy is exact) 
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Exercises 


For Exercises 1-4, use Green’s Theorem to evaluate the given line integral around the curve 
C, traversed counterclockwise. 


1. (j> (x 2 - y 2 )dx + 2 xydy; C is the boundary of R = {(x,y) : 0 < x < 1, 2x 2 < y < 2x} 




2. j> x ydx + 2 xydy, C is the boundary of R = {(x,y): 0 < x < 1, x A < y < x} 


x 2 + y 2 = 1 


3. 2ydx-3xdy; C is the circle . 

Q y2 2 

(e +y )dx + (e y +x )dy; C is the boundary of the triangle with vertices (0,0), (4,0) 


lc 

and (0,4) 


5. Is there a potential F(x,y) for f(x,y) = (y z + 3x 2 )i + 2xyj? If so, find one. 

6. Is there a potential F(x,y) for f(x,y) = (x 3 cos(xy) + 2xsin(xy))i + x 2 ycos(xy)j? If so, find 
one. 

7. Is there a potential F(x,y) for f(x,y) = (8xy + 3)i + 4(x 2 + y) j? If so, find one. 

8. Show that for any constants a, b and any closed simple curve C, adx + bdy = 0. 

B 

9. For the vector field f as in Example 4.8, show directly that § c f-dr = 0, where C is the 
boundary of the annulus R = {(x,y): 1/4 < x 2 + y 2 < 1} traversed so that R is always on 
the left. 

10. Evaluate j) e x siny dx + (y 3 + e x cos y)dy, where C is the boundary of the rectangle with 
vertices (1,-1), (1,1), (-1,1) and (-1,-1), traversed counterclockwise. 


c 


11. For a region R bounded by a simple closed curve C, show that the area A of R is 


A = - <fi ydx = (f xdy = - <£ xdy - ydx , 
Jc Jc 2 Jc 


where C is traversed so that R is always on the left. {Hint: Use Green’s Theorem and the 
fact that A = ff Id A.) 

R 
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Chapter eights 

Vector Algebra and functions 


Overview 

Many of you will know a good deal already about Vector Algebra how to add and subtract 
vectors, how to take scalar and vector products of vectors, and something of how to describe 
geometric and physical entities using vectors. This course will remind you about that good stuff, 
but goes on to introduce you to the subject of Vector Calculus which, like it says on the can, 
combines vector algebra with calculus. 

To give you a feeling for the issues, suppose you were interested in the temperature T of water in 

a river. Temperature T is a scalar, and will certainly be a function of a position vector x = (x. y. z) 

and may also be a function of time t: T — T(x, t). It is a scalar field. 

% 

Suppose now that you kept y, z, t constant, and asked what is the change in temperature as you 
move a small amount in x? No doubt you'd be interested in calculating dT/dx. Similarly if you 
kept the point fixed, and asked how does the temperature change of time, you would be interested 
in dT/dt. 

But why restrict ourselves to movements up-down, left-right, etc? Suppose you wanted to know 
what the change in temperature along an arbitrary direction. You would be interested in 

dr 

dx 

but how would you calculate that? Is dT/dx a vector or a scalar? 

Now let’s dive into the flow. At each point x in the stream, at each time t, there will be a stream 
velocity v(x, t). The local stream velocity can be viewed directly using modern techniques such 
as laser Doppler anemometry. or traditional techniques such as throwing twigs in. The point now 
is that v is a function that has the same four input variables as temperature did, but its output 
result is a vector. We may be interested in places x where the stream suddenly accelerates, or 
vortices where the stream curls around dangerously. That is. we will be interested in finding the 
acceleration of the stream, the gradient of its velocity. We may be interested in the magnitude of 
the acceleration (a scalar). Equally, we may be interested in the acceleration as a vector, so that 
we can apply Newton’s law and figure out the force. 

(his is the stuff of vector calculus. 
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Grey book 

Vector algebra: scalar and vector products; scalar and vector triple products; geometric appli¬ 
cations. Differentiation of a vector function; scalar and vector fields. Gradient, divergence and 
curl - definitions and physical interpretations; product formulae; curvilinear coordinates. Gauss' 
and Stokes’ theorems and evaluation of integrals over lines, surfaces and volumes. Derivation 
of continuity equations and Laplace's equation in Cartesian, cylindrical and spherical coordinate 
systems. 

Course Content 

• Introduction and revision of elementary concepts, scalar product, vector product. 

• Triple products, multiple products, applications to geometry. 

• Differentiation and integration of vector functions of a single variable. 

• Curvilinear coordinate systems. Line, surface and volume integrals. 

• Vector operators. 

• Vector Identities. 

• Gauss' and Stokes' Theorems. 

• Engineering Applications. 

Learning Outcomes 

You should be comfortable with expressing systems (especially those in 2 and 3 dimensions) using 
vector quantities and manipulating these vectors without necessarily going back to some underlying 
coordinates. 

You should have a sound grasp of the concept of a vector field, and be able to link this idea to 
descriptions of various physical phenomena. 

You should have a good intuition of the physical meaning of the various vector calculus operators 
and the important related theorems. You should be able to interpret the formulae describing 
physical systems in terms of this intuition. 

References 

Although these notes cover the material you need to know you should, wider reading is essen¬ 
tial. Different explanations and different diagrams in books will give you the perspective to glue 
everything together, and further worked examples give you the confidence to tackle the tute sheets. 


• J Heading, "Mathematical Methods in Science and Engineering”, 2nd ed., Ch.13, (Arnold). 

• G Stephenson, "Mathematical Methods for Science Students", 2nd ed., Ch.19, (Longman). 

• E Kreyszig, "Advanced Engineering Mathematics”, 6th ed., Ch.6, (Wiley). 

• K F Riley, M. P. Hobson and S. J. Bence, "Mathematical Methods for the Physics and 
Engineering” Chs.6, 8 and 9, (CUP). 

• A J M Spencer, et. al. "Engineering Mathematics”, Vol.l, Ch.6, (Van Nostrand Reinhold). 

• H M Schey, "Div, Grad, Curl and all that", Norton 
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Lecture 1 


Vector Algebra 


1.1 Vectors 

Many physical quantities, such a mass, time, temperature are fully specified by one 
number or magnitude. They are scalars. But other quantities require more than 
one number to describe them. They are vectors. You have already met vectors in 
their more pure mathematical sense in your course on linear algebra (matrices and 
vectors), but often in the physical world, these numbers specify a magnitude and 
a direction — a total of two numbers in a 2D planar world, and three numbers in 
3D. 

Obvious examples are velocity, acceleration, electric field, and force. Below, prob¬ 
ably all our examples will be of these “magnitude and direction” vectors, but we 
should not forget that many of the results extend to the wider realm of vectors. 

There are three slightly different types of vectors: 

• Free vectors: In many situtations only the magnitude and direction of a 
vector are important, and we can translate them at will (with 3 degrees of 
freedom for a vector in 3-dimensions). 

• Sliding vectors: In mechanics the line of action of a force is often important 
for deriving moments. The force vector can slide with 1 degree of freedom. 

• Bound or position vectors: When describing lines, curves etc in space, it is 
obviously important that the origin and head of the vector are not translated 
about arbitrarily. The origins of position vectors all coincide at an overall 
origin O. 

One the advantages of using vectors is that it frees much of the analysis from 
the restriction of arbitrarily imposed coordinate frames. For example, if two free 
vectors are equal we need only say that their magnitudes and directions are equal, 
and that can be done with a drawing that is independent of any coordinate system. 

However, coordinate systems are ultimately useful, so it useful to introduce the 
idea of vector components. Try to spot things in the notes that are independent 
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LECTURE 1. VECTOR ALGEBRA 




Figure 1.1: 


of coordinate system. 

1.1.1 Vector elements or components in a coordinate frame 

A method of representing a vector is to list the values of its elements or components 
in a sufficient number of different (preferably mutually perpendicular) directions, 
depending on the dimension of the vector. These specified directions define a 
coordinate frame. In this course we will mostly restrict our attention to the 
3-dimensional Cartesian coordinate frame 0(x,y,z). When we come to examine 
vector fields later in the course you will use curvilinear coordinate frames, especially 
3D spherical and cylindrical polars, and 2D plane polar, coordinate systems. 



Figure 1.2: Vector components. 

In a Cartesian coordinate frame we write 

a = [ai, a 2 , a 3 ] = [x 2 - x lt y 2 - yi, z 2 - z x ] or a = [a x , a y , a z ] 

as sketched in Figure 1.2. Defining /, j, k as unit vectors in the x, y, z directions 
/ = [1,0,0] j= [0,1,0] k = [ 0,0,1] 
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1.1. VECTORS 



Figure 1.3: (a) Addition of two vectors is commutative, but subtraction isn't. Note that the 
coordinate frame is irrelevant, (b) Addition of three vectors is associative. 

we could also write 


3 — 3 \l ~F c?2_/ + 33/C 


Although we will be most often dealing with vectors in 3-space, you should not 
think that general vectors are limited to three components. 

In these notes we will use bold font to represent vectors a, w, In your written work, 
underline the vector symbol a,u and be meticulous about doing so. We shall use 
the hat to denote a unit vector. 

1.1.2 Vector equality 

Two free vectors are said to be equal iff their lengths and directions are the same. 
If we use a coordinate frame, we might say that corresponding components of 
the two vectors must be equal. This definition of equality will also do for position 
vectors, but for sliding vectors we must add that the line of action must be identical 
too. 

1.1.3 Vector magnitude and unit vectors 

Provided we use an orthogonal coordinate system, the magnitude of a 3-vector is 



To find the unit vector in the direction of a, simply divide by its magnitude 


a 


a = -.—r 

a 


1.1.4 Vector Addition and subtraction 

Vectors are added/subtracted by adding/subtracting corresponding components, 
exactly as for matrices. Thus 

a + b = [ai + bi, 22 + ^ 2 , 23 + ^ 3 ] 
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LECTURE 1. VECTOR ALGEBRA 


Addition follows the parallelogram construction of Figure 1.3(a). Subtraction (a — 
b) is defined as the addition (a -f (— b)). It is useful to remember that the vector 
a — b goes from b to a. 

The following results follow immediately from the above definition of vector addi¬ 
tion: 

(a) a + b = b + a (commutativity) (Figure 1.3(a)) 

(b) (a + b) + c = a + (b + c) = a + b + c (associativity) (Figure 1.3(b)) 

(c) a + 0 = 0 + a = a, where the zero vector is 0 = [0, 0, 0], 

(d) a + (-a) = 0 

1.1.5 Multiplication of a vector by a scalar. (NOT the scalar product!) 

Just as for matrices, multiplication of a vector a by a scalar c is defined as multi¬ 
plication of each component by c, so that 

ca = [C3i, ca 2 , ca 3 ]. 

It follows that: 

|ca| = v 7 (cai) 2 + ( c< 3 2 ) 2 + (ca 3 ) 2 = |c||a|. 

The direction of the vector will reverse if c is negative, but otherwise is unaffected. 
(By the way, a vector where the sign is uncertain is called a director.) 

X Example 

Q. Coulomb's law states that the electrostatic force on charged particle Q due 
to another charged particle Qi is 


where r is the vector from q i to Q and ? is the unit vector in that same 
direction. (Note that the rule “unlike charges attract, like charges repel” is 
built into this formula.) The force between two particles is not modified by 
the presence of other charged particles. 

Hence write down an expression for the force on Q at R due to N charges q, 
at r,. 

A. The vector from g, to Q is R — r,. The unit vector in that direction is 
(R — r,)/|R — r,|, so the resultant force is 


N 




1 = 1 

Note that F(R) is a vector field. 
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1.2 Scalar, dot, or inner product 

This is a product of two vectors results in a scalar quantity and is defined as follows 
for 3-component vectors: 

a b = a\bi + a 2 b 2 + a 3 b 3 . 

Note that 

9 9 9 i 19 9 

a ■ a = a{ + a 2 + a 3 = |a| = a . 

The following laws of multiplication follow immediately from the definition: 

(a) a ■ b = b • a (commutativity) 

(b) a(b + c) = ab + ac (distributivity with respect to vector addition) 

(c) (Aa) • b = A(a ■ b) = a • (Ab) scalar multiple of a scalar product of two vectors 

1.2.1 Geometrical interpretation of scalar product 



(a) (b) 

Figure 1.4: (a) Cosine rule, (b) Projection of b onto a. 


Consider the square magnitude of the vector a — b. By the rules of the scalar 
product, this is 

|a — b| 2 = (a — b) ■ (a — b) 

= a a + b b -2(a b) 

= a 2 + b 2 — 2(a • b) 
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But, by the cosine rule for the triangle OAB (Figure 1.4a), the length AB 2 is given 
by 


|a — b| 2 = a 2 + b 2 — 2abcos6 

where 6 is the angle between the two vectors. It follows that 
a ■ b = abcosO, 

which is independent of the co-ordinate system used, and that |a ■ b| < ab. Con¬ 
versely, the cosine of the angle between vectors a and b is given by cos 6 = ab /ab. 

1.2.2 Projection of one vector onto the other 

Another way of describing the scalar product is as the product of the magnitude 
of one vector and the component of the other in the direction of the first, since 
bcosO is the component of b in the direction of a and vice versa (Figure 1.4b). 

Projection is particularly useful when the second vector is a unit vector — a ■ / is 
the component of a in the direction of?. 

Notice that if we wanted the vector component of b in the direction of a we 
would write 


(b a)a 


(b a)a 


In the particular case a • b = 0, the angle between the two vectors is a right angle 
and the vectors are said to be mutually orthogonal or perpendicular — neither 
vector has any component in the direction of the other. 

A XX 

An orthonormal coordinate system is characterised by / • / = j ■ j = k • k — 1; and 
1 -j = j -k = k -1 = 0. 


1.2.3 A scalar product is an “inner product” 

So far we have been writing our vectors as row vectors a = [ai, a 2 , a 3 ]. This is 
convenient because it takes up less room than writing column vectors 


a = 


a i 
a 2 
a 3 


In matrix algebra vectors are more usually defined as column vectors, as in 


Mil Mn M 13 


" a 1 ' 


" vi " 

M 2 i M 22 M 22i 


a 2 

— 

V2 

1 

CO 

CO 

C\J 

CO 

^ — i 
CO 


. a 3 . 


. V 3 . 
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and a row vector is written as a 1 . Now for most of our work we can be quite 
relaxed about this minor difference, but here let us be fussy. 

Why? Simply to point out at that the scalar product is also the inner product 
more commonly used in linear algebra. Defined as a 1 b when vectors are column 
vectors as 


a b = a T b = [a lt a 2 , a 3 ] 


bi 

b 2 

63 


a\b\ T a 2 b 2 a 3 b 3 . 


Here we treat a /7-dimensional column vector as an n x 1 matrix. 

(Remember that if you multiply two matrices M mxn Nnxp then M must have the 
same columns as N has rows (here denoted by n) and the result has size (rows x 
columns) of m x p. So for n-dimensional column vectors a and b, a r is a 1 x n 
matrix and b is n x 1 matrix, so the product a T b is a 1 x 1 matrix, which is (at 
last!) a scalar.) 


£ Examples 

Ql. A force F is applied to an object as it moves by a small amount <5r. What 
work is done on the object by the force? 

Al. The work done is equal to the component of force in the direction of the dis¬ 
placement multiplied by the displacement itself. This is just a scalar product: 


5W = F • <5r 


Q2. A cube has four diagonals, connecting opposite vertices. What is the angle 
between an adjacent pair? 


A2. Well, you could plod through using 
Pythagoras' theorem to find the length 
of the diagonal from cube vertex to cube 
centre, and perhaps you should to check 
the following answer. 

The directions of the diagonals are 
[±1,±1,±1]. The ones shown in the 
figure are [1,1,1] and [—1,1,1], The 
angle is thus 


[- 1 , 1 , 1 ] 



6 = cos 1 


[ 1 , 1 , 1 ]-[- 1 , 1 , 1 ] 

Vl 2 + l 2 + lV-1 2 + l 2 + l 2 



[ 1 , 1 , 1 ] 
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Q3. A pinball moving in a plane with velocity s bounces (in a purely elastic impact) 
from a baffle whose endpoints are p and q. What is the velocity vector after 
the bounce? 

A3. Best to refer everything to a coordi¬ 
nate frame with principal directions 
u along and v perpendicular to the 
baffle: 

q - p 

U = |- 7 

|q - p| 

V = = [~Uy, U X ] 

Thus the velocity before impact is 

Sbefore = (s.u)Q + (s.v)v 

After the impact, the component of 
velocity in the direction of the baf¬ 
fle is unchanged and the component 
normal to the baffle is negated: 

Safer = (s.u)Q - (s.v)v 

1.2.4 Direction cosines use projection 

Direction cosines are commonly used in the field of crystallography. The quantities 

. a ■ ? a j a ■ k 

A = -, /x =-, v =- 

a a a 

represent the cosines of the angles which the vector a makes with the co-ordinate 
vectors 1J, k and are known as the direction cosines of the vector a. Since 
a ■ ? = a\ etc, it follows immediately that a = a(A? + hj + uk) and A 2 + ji 2 + u 2 = 
+ a\ + a§] = 1 

1.3 Vector or cross product 

The vector product of two vectors a and b is denoted by a x b and is defined as 
follows 

a x b = ( a 2 b 3 - a 3 b 2 )l + (a 3 bi - a 1 b 3 )J + (a 1 b 2 - a 2 b x )k. 
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Figure 1.5: The direction cosines are cosines of the angles shown. 


It is MUCH more easily remembered in terms of the (pseudo-)determinant 


a x b = 


? j k 

3\ <32 <33 
bi b 2 b 3 


where the top row consists of the vectors ?, j, k rather than scalars. 

Since a determinant with two equal rows has value zero, it follows that a x a = 0. 
It is also easily verified that (a x b) • a = (a x b) • b = 0, so that a x b is orthogonal 
(perpendicular) to both a and b, as shown in Figure 1.6. 

Note that ? xj = k, j x k = ?, and k x / = j. 

The magnitude of the vector product can be obtained by showing that 
|a x b| 2 + (a ■ b) 2 = a 2 b 2 
from which it follows that 
|a x b| = absin 6 , 

which is again independent of the co-ordinate system used. This is left as an 
exercise. 

Unlike the scalar product, the vector product does not satisfy commutativity but 
is in fact anti-commutative, in that a x b = — b x a. Moreover the vector product 
does not satisfy the associative law of multiplication either since, as we shall see 
later a x (b x c) / (a x b) x c. 

Since the vector product is known to be orthogonal to both the vectors which form 
the product, it merely remains to specify its sense with respect to these vectors. 
Assuming that the co-ordinate vectors form a right-handed set in the order ?j, k 
it can be seen that the sense of the the vector product is also right handed, i.e 
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the vector product has the same sense as the co-ordinate system used. 


/V /\ 

IXJ = 


1 J k 

1 0 0 
0 1 0 



In practice, figure out the direction from a right-handed screw twisted from the 
first to second vector as shown in Figure 1.6(a). 



Figure 1.6: (a)The vector product is orthogonal to both a and b. Twist from first to second and 
move in the direction of a right-handed screw, (b) Area of parallelogram is abs\n9. 


1.3.1 Geometrical interpretation of vector product 

The magnitude of the vector product (a x b) is equal to the area of the parallelo¬ 
gram whose sides are parallel to, and have lengths equal to the magnitudes of, the 
vectors a and b (Figure 1.6b). Its direction is perpendicular to the parallelogram. 

£ Example 

Q. g is vector from A [1,2,3] to B [3,4,5], 

i is the unit vector in the direction from O to A. 

Find ni, a UNIT vector along g x l 
Verify that m is perpendicular to i. 

Find n, the third member of a right-handed coordinate set l, m, n. 


A. 


g = [3,4, 5]-[1,2,3] = 




3] 


g x 


l 

VT4 


/ j k 

2 2 2 
1 2 3 


[ 2 , 2 , 2 ] 


1 

\flA 


[2,-4, 2] 
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Hence 

m ^ P ,- 4 , 21 

and 

n = l x m 
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Lecture 2 


Multiple Products. Geometry using Vectors 


2.1 Triple and multiple products 

Using mixtures of the pairwise scalar product and vector product, it is possible to 
derive “triple products" between three vectors, and indeed n-products between n 
vectors. 

There is nothing about these that you cannot work out from the definitions of pair¬ 
wise scalar and vector products already given, but some have interesting geometric 
interpretations, so it is worth looking at these. 

2.1.1 Scalar triple product 

This is the scalar product of a vector product and a third vector, i.e. a • (b x c). 
Using the pseudo-determinant expression for the vector product, we see that the 
scalar triple product can be represented as the true determinant 


a • (b x c) 


3 l 3 2 ^3 

bi b 2 b 3 
Ci c 2 c 3 


You will recall that if you swap a pair of rows of a determinant, its sign changes; 
hence if you swap two pairs, its sign stays the same. 


3\ 3 2 a 3 


Cl c 2 c 3 


Cl c 2 c 3 

b\ b 2 b 3 

1st SWAP 

bi b 2 b 3 

2nd SWAP 

3\ a 2 a 3 

Ci c 2 c 3 


3\ 3 2 a 3 


bi b 2 b 3 


+ - + 

This says that 

(i) a ■ (b x c) = b • (c x a) = c • (a x b) (Called cyclic permutation.) 
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(ii) a • (b x c) = —b • (a x c) and so on. (Called anti-cyclic permutation.) 

(iii) The fact that a ■ (b x c) = (a x b) • c allows the scalar triple product to be 
written as [a, b, c]. This notation is not very helpful, and we will try to avoid 
it below. 

2.1.2 Geometrical interpretation of scalar triple product 

The scalar triple product gives the volume of the parallelopiped whose sides are 
represented by the vectors a, b, and c. 

We saw earlier that the vector product (a x b) has magnitude equal to the area 
of the base, and direction perpendicular to the base. The component of c in this 
direction is equal to the height of the parallelopiped shown in Figure 2.1(a). 




Figure 2.1: (a) Scalar triple product equals volume of parallelopiped. (b) Coplanarity yields zero 
scalar triple product. 


2.1.3 Linearly dependent vectors 

If the scalar triple product of three vectors is zero 
a ■ (b x c) = 0 

then the vectors are linearly dependent. That is, one can be expressed as a linear 
combination of the others. For example, 

a = Ab + pc 

where A and p are scalar coefficients. 

You can see this immediately in two ways: 

• The determinant would have one row that was a linear combination of the 
others. You’ll remember that by doing row operations, you could reach a row 
of zeros, and so the determinant is zero. 

• The parallelopiped would have zero volume if squashed flat. In this case all 
three vectors lie in a plane, and so any one is a linear combination of the 
other two. (Figure 2.1b.) 
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2.1.4 Vector triple product 

This is defined as the vector product of a vector with a vector product, a x (b x c). 
Now, the vector triple product a x (b x c) must be perpendicular to (b x c), which 
in turn is perpendicular to both b and c. Thus a x (b x c) can have no component 
perpendicular to b and c, and hence must be coplanar with them. It follows that 
the vector triple product must be expressible as a linear combination of b and c: 

a x (b x c) = Ab + /xc . 

The values of the coefficients can be obtained by multiplying out in component 
form. Only the first component need be evaluated, the others then being obtained 
by symmetry. That is 


(a x (b x c))i 


a 2 (b x c) 3 - a 3 (b x c) 2 
a 2 0 ic 2 - 62 ci) + a 3 0 ic 3 - b 3 ci) 

{a 2 c 2 + a 3 c 3 )b 1 - ( a 2 b 2 + a 3 b 3 )c 1 

(aiCi + a 2 c 2 + a 3 c 3 )b\ — (a\bi + a 2 b 2 + a 3 b 3 )ci 

(a ■ c)bi — (a • b)ci 


The equivalents must be true for the 2nd and 3rd components, so we arrive at the 
identity 

a x (b x c) = (a • c)b - (a • b)c . 



2.1.5 Projection using vector triple product 

An example of the application of this formula is as follows. Suppose v is a vector 
and we want its projection into the xy-plane. The component of v in the z 

A A A A 

direction is v • Ac, so the projection we seek is v — (v • k)k. Writing k <- a, v b, 
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k <- c, 

(a • c)b - (a • b)c 

(k ■ k)v - (k ■ v)k 
v - (v • k)k 

So v — (v ■ k)k = k x (v x k). 

A /V 

(Hot stuff! But the expression v — (v • k)k is much easier to understand, and 
cheaper to compute!) 

2.1.6 Other repeated products 

Many combinations of vector and scalar products are possible, but we consider only 
one more, namely the vector quadruple product (a x b) x (c x d). By regarding 
a x b as a single vector, we see that this vector must be representable as a linear 

combination of c and d. On the other hand, regarding c x d as a single vector, we 

see that it must also be a linear combination of a and b. This provides a means 
of expressing one of the vectors, say d, as linear combination of the other three, 
as follows: 

(a x b) x (c x d) = [(a x b) • d]c - [(a x b) ■ c]d 

= [(c x d) • a]b - [(c x d) ■ b]a 


a x (b x c) = 

4 * 4 * 4 ' 

A A 

k x (v x k) = 


Hence 

[(a x b) • c] d = [(b x c) • d] a + [(c x a) • d] b + [(a x b) • d] c 


or 

[(b x c) ■ d] a + [(c x a) ■ d] b + [(a x b) ■ d] c 
[(a x b) • c] 


aa+ pb + 7 c 


This is not something to remember off by heart, but it is worth remembering that 
the projection of a vector on any arbitrary basis set is unique. 


Example 

Q1 Use the quadruple vector product to express the vector d = [3, 2, 1] in terms 
of the vectors a = [1, 2, 3], b = [2, 3, 1] and c = [3,1, 2], 

A1 Grinding away at the determinants, we find 

[(axb)-c] = —18; [(bxc)-d] = 6 ; [(cxa)-d] = —12; [(ax b)d] = —12 

So, d = (—a + 2b + 2c)/3. 
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a 


b 


Figure 2.3: The projection of a (3-)vector onto a set of (3) basis vectors is unique, le in d = 
aa + /3b + 7c, the set { a , P, 7} is unique. 

2.2 Geometry using vectors: lines, planes 

2.2.1 The equation of a line 

The equation of the line passing through the point whose position vector is a and 
lying in the direction of vector b is 

r = a + Ab 

where A is a scalar parameter. If you make b a unit vector, r = a + Ab then A will 
represent metric length. 

For a line defined by two points ai and a 2 
r = ai + A(a 2 - ai) 
or for the unit version 



a 2 — ai 



A 


Point r traces 
out line. 


Figure 2.4: Equation of a line. With b a unit vector, A is in the length units established by the 
definition of a. 
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2.2.2 The shortest distance from a point to a line 

Referring to Figure 2.5(a) the vector p from c to any point on the line is p = 
a + Ab — c = (a — c) + Ab which has length squared p 2 = (a — c) 2 + A 2 + 2A(a — 
c) • b . Rather than minimizing length, it is easier to minimize length-squared. The 
minumum is found when d p 2 / d\ = 0, ie when 

A = —(a — c) b . 

So the minimum length vector is 
p = (a — c) — ((a — c) ■ b)b. 

You might spot that is the component of (a — c) perpendicular to b (as expected!). 
Furthermore, using the result of Section 2.1.5, 

p = b x [(a - c) x b] . 

Because b is a unit vector, and is orthogonal to [(a — c) x b], the modulus of the 
vector can be written rather more simply as just 

Pmin = | (a - c) X b| . 



Figure 2.5: (a) Shortest distance point to line, (b) Shortest distance, line to line. 

2.2.3 The shortest distance between two straight lines 

If the shortest distance between a point and a line is along the perpendicular, then 
the shortest distance between the two straight lines r = a + Ab and r = c + pd 
must be found as the length of the vector which is mutually perpendicular to the 
lines. 

The unit vector along the mutual perpendicular is 
p = (b x d)/|b x d| . 

(Yes, don't forget that b x d is NOT a unit vector, b and d are not orthogonal, 
so there is a sin 0 lurking!) 

The minimum length is therefore the component of a — c in this direction 

Pmin = I (a - c) ■ (b x d)/|b x d|| . 
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X Example 


Q Two long straight pipes are specified using Cartesian co-ordinates as follows: 
Pipe A has diameter 0.8 and its axis passes through points (2,5,3) and 
(7,10,8). 

Pipe B has diameter 1.0 and its axis passes through the points (0,6,3) and 
(-12,0,9). 

Determine whether the pipes need to be realigned to avoid intersection. 

A Each pipe axis is defined using two points. The vector equation of the axis 
of pipe A is 

r = [2, 5, 3] + A'[5, 5, 5] = [2, 5, 3] + A[l, 1, 1]/V3 
The equation of the axis of pipe B is 

r = [0, 6, 3] + n'[ 12, 6, 6] = [0, 6, 3] + n[- 2, - 1 , 1]/VE 
The perpendicular to the two axes has direction 

/ j k 

1 1 1 

-2 -1 1 

The length of the mutual perpendicular is 


[1,1,1] x [-2,-1,1] = 


= [2, -3,1] = p 


, , 2,-3,1 , , 2,-3,1 

(a - c ■ 1 ’ ‘ = 2, -1,0 - * 1 2 3 _ J = 1.87 . 

Vl4 Vu 

But the sum of the radii of the two pipes is 0.4 + 0.5 = 0.9. Hence the pipes 

do not intersect. 

2.2.4 The equation of a plane 

There are a number of ways of specifying the equation of a plane. 


1. If b and c are two non-parallel vectors (ie b x c ^ 0), then the equation of 
the plane passing through the point a and parallel to the vectors b and c may 
be written in the form 

r = a + Ab + /xc 

where A, /x are scalar parameters. Note that b and c are free vectors, so don’t 
have to lie in the plane (Figure 2.6(a).) 

2. Figure 2.6(b) shows the plane defined by three non-collinear points a, b and 
c in the plane (note that the vectors b and c are position vectors, not free 
vectors as in the previous case). The equation might be written as 

r = a + A(b — a) + /x(c — a) 

3. Figure 2.6(c) illustrates another description is in terms of the unit normal to 
the plane n and a point a in the plane 

r ■ n = a • n . 
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Figure 2.6: (a) Plane defined using point and two lines, (b) Plane defined using three points, (c) 
Plane defined using point and normal. Vector r is the position vector of a general point in the 
plane. 


2.2.5 The shortest distance from a point to a plane 

The shortest distance from a point d to the plane is along the perpendicular. 
Depending on how the plane is defined, this can be written as 


D = |(d — a) • n| or 


D = 


l(d — a) • (b x c)| 
lb x c| 


2.3 Solution of vector equations 

It is sometimes required to obtain the most general vector which satisfies a given 
vector relationship. This is very much like obtaining the locus of a point. The best 
method of proceeding in such a case is as follows: 

(i) Decide upon a system of three co-ordinate vectors using two non-parallel vectors 
appearing in the vector relationship. These might be a, b and their vector product 

(a x b). 

(ii) Express the unknown vector x as a linear combination of these vectors 

x = Aa + /ib + ua x b 
where A, /i, u are scalars to be found. 

(iii) Substitute the above expression for x into the vector relationship to determine 
the constraints on A, /i and w for the relationship to be satisfied. 

£ Example 

Q Solve the vector equation x = x x a + b. 

A Step (i): Set up basis vectors a, b and their vector product a x b. 

Step (ii): x = Aa + /xb + ua x b. 
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2.4. ROTATION, ANGULAR VELOCITY/ACCELERATION AND MOMENTS 
Step (iii): Bung this expression for x into the equation! 

Aa + /xb + ua x b = (Aa + /xb + ua x b) x a + b 

= 0 + /x(b x a) + u(a x b) x a + b 
= —u(a • b)a + (ua 2 + l)b — ii(a x b) 

We have learned that any vector has a unique expression in terms of a basis 
set, so that the coefficients of a, b and a x b on either side of the equation 
much be equal. 

=> A = —u(a ■ b) 
li = ua 2 + 1 
u = — ii 

so that 

1 1 a • b 

/i = ^-o w -o A = -- -z . 

1 + a 2 1 + a 2 1 + a 2 

So finally the solution is the single point: 

x = (( a • b )a + b - (a x b)) 


2.4 Rotation, angular velocity/acceleration and moments 

A rotation can be represented by a vector whose direction is along the axis of 
rotation in the sense of a r-h screw, and whose magnitude is proportional to the 
size of the rotation (Fig. 2.7). The same idea can be extended to the derivatives, 
that is, angular velocity u and angular acceleration lj. 

Angular accelerations arise because of a moment (or torque) on a body. In me¬ 
chanics, the moment of a force F about a point Q is defined to have magnitude 
M = Fd, where d is the perpendicular distance between Q and the line of action 
L of F. 

The vector equation for moment is 

M = r x F 

where r is the vector from Q to any point on the line of action L of force F. 
The resulting angular acceleration vector is in the same direction as the moment 
vector. 

The instantaneous velocity of any point P on a rigid body undergoing pure rotation 
can be defined by a vector product as follows. The angular velocity vector lj has 
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CO 

in right-hand screw sense 



Figure 2.7: The angular velocity vector u is along the axis of rotation and has magnitude equal to 
the rate of rotation. 

magnitude equal to the angular speed of rotation of the body and with direction 
the same as that of the r-h screw. If r is the vector OP, where the origin O can 
be taken to be any point on the axis of rotation, then the velocity v of P due to 
the rotation is given, in both magnitude and direction, by the vector product 

v = lj x r. 
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Lecture 3 


Differentiating Vector Functions of a Single 
Variable 


Your experience of differentiation and integration has extended as far as scalar 
functions of single and multiple variables — functions like f(x) and f(x, y, t). 

It should be no great surprise that we often wish to differentiate vector func¬ 
tions. For example, suppose you were driving along a wiggly road with position 
r(t) at time t. Differentiating r(t) wrt time should yield your velocity v(t), and 
differentiating v(f) should yield your acceleration. Let's see how to do this. 


3.1 Differentiation of a vector 


The derivative of a vector function a(p) of a single parameter p is 


a'(p) 


lim 

5p —>0 


a(p + 5p) 
5p 


a O) 


If we write a in terms of components relative to a FIXED coordinate system (1,], k 
constant) 

a(p) = 3i(p)/ + a 2 (p)j + a 3 {p)k 


then 


Ap) 





That is, in order to differentiate a vector function, one simply differentiates each 
component separately. This means that all the familiar rules of differentiation 
apply, and they don’t get altered by vector operations like scalar product and 
vector products. 

Thus, for example: 


^ . x da 

—(a x b) = — x b 


a x 


db 

dp 


d . , , da db 

Td a ' b) = T p - b + a -iP 
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Note that da/dp has a different direction and a different magnitude from a. 

Likewise, as you might expect, the chain rule still applies. If a = a(u) and u = u(t), 
say: 

d da du 

dt 3 du dt 


£ Examples 

Q A 3D vector a of constant magnitude is varying over time. What can you say 
about the direction of a? 


A Using intuition: if only the direction is changing, then the vector must be 
tracing out points on the surface of a sphere. We would guess that the 
derivative a is orthogonal to a. 

To prove this write 


d 

dt 


(a-a) 



da da 

— a 2a — 
dt dt 


But (a • a) = a 2 which we are told is constant. So 


d . . 

— (a ■ a) = 0 
dt y ' 


da 

2a • — = 0 
dt 


and hence a and da/dt must be perpendicular. 

Q The position of a vehicle is r(u) where u is the amount of fuel consumed by 
some time t. Write down an expression for the acceleration. 


A The velocity is 

dr dr du 
dt dudt 

d dr d 2 r ddu\ 2 dr d 2 u 
dtdt du 2 \dt) dudt 2 


3.1.1 Geometrical interpretation of vector derivatives 

Let r(p) be a position vector tracing a space curve as some parameter p varies. 
The vector <5r is a secant to the curve, and 5r/5p lies in the same direction. (See 
Fig. 3.1.) In the limit as 5p tends to zero 5r/5p = dr/dp becomes a tangent to 
the space curve. If the magnitude of this vector is 1 (i.e. a unit tangent), then 
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\dr\ = dp so the parameter p is arc-length (metric distance). More generally, 
however, p will not be arc-length and we will have: 

dr dr ds 
dp dsdp 

So, the direction of the derivative is that of a tangent to the curve, and its 
magnitude is \ds/dp\, the rate of change of arc length w.r.t the parameter. 

Of course if that parameter p is time, the magnitude \dr/dt\ is the speed. 


X Example 


Q Draw the curve 
r = acos( 


V a 2 + h 2 


)? + asin( 


V a 2 + h 2 


)J + 


hs 


V a 2 + h 2 


where s is arc length and h, a are constants. Show that the tangent dr/ds 
to the curve has a constant elevation angle w.r.t the xy-plane, and determine 
its magnitude. 

A 


dr 

ds 


V a 2 + h 2 


sin () / + 


V a 2 + h 2 


cos() J 


h 


V a 2 + h 2 


The projection on the xy plane has magnitude a/V a 2 + h 2 and in the z 
direction h/Va 2 + h 2 , so the elevation angle is a constant, tan _1 (/7/a). 

We are expecting dr/ds = 1, and indeed 


Y /a 2 sin 2 () + a 2 cos 2 () -I- h 2 /\/a 2 + h 2 = 1. 


3.1.2 Arc length is a special parameter! 

It might seem that we can be completely relaxed about saying that any old pa¬ 
rameter p is arc length, but this is not the case. Why not? The reason is that arc 
length is special is that, whatever the parameter p, 


s = 



dr 

dp 


dp 


Perhaps another way to grasp the significance of this is using Pythagoras' theorem 
on a short piece of curve: in the limit as dx etc tend to zero, 

ds 2 = dx 2 + dy 2 + dz 2 . 
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5r 

r(s) I 

/r (S + Ss) 


£=1 

ds 



Figure 3.1: Left: <5r is a secant to the curve but, in the limit as 5p —> 0, becomes a tangent. 
Right: if the parameter is arc length s, then \dr\ — ds. 


So if a curve is parameterized in terms of p 
ds I dx 2 ~dy 2 dz 2 

dp y dp + dp + dp 

As an example, suppose in our earlier example we had parameterized our helix as 
r = acosp? + asinpj + hpk 

It would be easy just to say that p was arclength, but it would not be correct 
because 

ds i dx 2 dy 2 dz 2 
dp y dp + dp + dp 

= \J a 2 sin 2 p + a 2 cos 2 p-|-/? 2 = \J a 2 + h 2 
If p really was arclength, ds/dp = 1. So p/Va 2 + h 2 is arclength, not p. 

3.2 Integration of a vector function 

The integration of a vector function of a single scalar variable can be regarded 
simply as the reverse of differentiation. In other words 

ruuu dp 

jpi d p 

For example the integral of the acceleration vector of a point over an interval of 
time is equal to the change in the velocity vector during the same time interval. 
However, many other, more interesting and useful, types of integral are possible, 
especially when the vector is a function of more than one variable. This requires 
the introduction of the concepts of scalar and vector fields. See later! 
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3.3 Curves in 3 dimensions 

In the examples above, parameter p has been either arc length s or time t. It 
doesn’t have to be, but these are the main two of interest. Later we shall look 
at some important results when differentiating w.r.t. time, but now let use look 
more closely at 3D curves defined in terms of arc length, s. 

Take a piece of wire, and bend it into some arbitrary non-planar curve. This is a 
space curve. We can specify a point on the wire by specifying r(s) as a function 
of distance or arc length s along the wire. 

3.3.1 The Frenet-Serret relationships 

We are now going to introduce a local orthogonal coordinate frame for each point 
s along the curve, ie one with its origin at r(s). To specify a coordinate frame we 
need three mutually perpendicular directions, and these should be intrinsic to the 
curve, not fixed in an external reference frame. The ideas were first suggested by 
two French mathematicians, F-J. Frenet and J. A. Serret. 


1. Tangent t 

There is an obvious choice for the first direction at the point r(s), namely the 
unit tangent t. We already know that 



dr(s) 

ds 


2. Principal Normal n 

Recall that earlier we proved that if a was a vector of constant magnitude 
that varies in direction over time then da/dt was perpendicular to it. Because 
t has constant magnitude but varies over s, dt/ds must be perpendicular to 

t. 

Hence the principal normal n is 

dt 

— = k ,n : where k > 0 . 
ds 

k is the curvature, and k, = 0 for a straight line. The plane containing t and 
n is called the osculating plane. 

3. The Binormal b 

The local coordinate frame is completed by defining the binormal 
b(s) = t(s) x n(s) . 
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Since b • t = 0 



ds 


ds ds 


from which 



But this means that db/ds is along the direction of n, or 


- = —r(s)n(s) 


where r is the torsion, and the negative sign is a matter of convention. 
Differentiating n • t = 0 and n ■ b = 0, we find 

^ = -/c(s)t(s) + r(s)b(s). 

The Frenet-Serret relationships: 

dt/ds = k n 

dn/ds — —/c(s)t(s) + r(s)b(s) 
db/ds = —r(s)n(s) 

X Example 

Q Derive k(s) and t(s) for the helix 

r(s) = a cos / + a sin j + h k, (3 = \f a 2 + h 2 

and comment on their values. 

A We found the unit tangent earlier as 



Differentiation gives 


k n = — 

ds 


d t 


a 



0 
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Curvature is always positive, so 



So the curvature is constant, and the normal is parallel to the xy-plane. 
Now use 


txn 


1 J k 

(~a/P)S ( a/p)C (h/p) 
-C -5 0 


h f s' 

r m {p. 


and differentiate b to find an expression for the torsion 

— h. 


db 

ds 


h (s 

t C05 U 


h (s 

^ sm u 


0 


P 2 


n 


so the torsion is 
h 

T ~w 

again a constant. 


h f s\ a 

— — cos — , — 

p \p p 


3.4 Radial and tangential components in plane polars 

In plane polar coordinates, the radius vector 
of any point P is given by 


r = r cos 91 + rsin 9] 


= re r 

where we have introduced the unit radial vec¬ 
tor 


e r = cos 91 + sin 9] . 

The other “natural” (we’ll see why in a later 
lecture) unit vector in plane polars is orthog¬ 
onal to e r and is 



ee = — sin 91 + cos 9] 


so that e r ■ e r = ee ■ ee = 1 and e r • e@ = 0. 
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Now suppose P is moving so that r is a function of time t. Its velocity is 
d ( „ . dr „ de r 

r = Tt (rer) = ~dt er + r Tt 

dr ~ , d6 < • or , am 
dt dt y J 

dr ^ d6 . 

= + r “77 e 0 

dt dt 

= radial + tangential 

The radial and tangential components of velocity of P are therefore dr/dt and 
rdd/dt, respectively. 

Differentiating a second time gives the acceleration of P 


r = 


d 2 r, 
' dt 


drdO„ drd6„ d 2 6 
dt dt Gd ^ dt dt Gd ^ r dt 2 


d 2 r 

i 

C\J 

- 

dt 2 r 

K dt ) _ 

e r + 


dr d6 

^ dt dt + r dt 2 


d9d9 
~d\ 
d 2 6' 


e e ~ r-T--7-e r 
dt dt 
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3.5. ROTATING SYSTEMS 


3.5 Rotating systems 

Consider a body which is rotating with constant angular velocity uj about some 
axis passing through the origin. Assume the origin is fixed, and that we are sitting 
in a fixed coordinate system Oxyz. 

If p is a vector of constant magnitude and constant direction in the rotating system, 
then its representation r in the fixed system must be a function of t. 


r (t) = R (t)p 


At any instant as observed in the fixed system 


but the second term is zero since we assumed p to be constant so we have 



dt 


Note that: 

• dr/dt will have fixed magnitude; 

• dr/dt will always be perpendicular to the axis of rotation; 

• dr/dt will vary in direction within those constraints; 

• r(t) will move in a plane in the fixed system. 


i 
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Now let’s consider the term RR T . First, note that RR T = I (the identity), so 
differentiating both sides yields 

RR t + RR T = 0 

rr t = —rr t 


Thus RR t is anti-symmetric: 


RR t 


0 -z y 
z 0 -x 
-y x 0 


Now you can verify for yourself that application of a matrix of this form to an 
arbitrary vector has precisely the same effect as the cross product operator, ux, 
where lj = [xyz] T . Loh-and-behold, we then we have 


r = lj x r 


matching the equation at the end of lecture 2, v = lj xr, as we would hope/expect. 

3.5.1 Rotation: Part 2 

Now suppose p is the position vector of a point P which moves in the rotating 
frame. There will be two contributions to motion with respect to the fixed frame, 
one due to its motion within the rotating frame, and one due to the rotation itself. 
So, returning to the equations we derived earlier: 

r(t) = R (t)p(t) 

and the instantaenous differential with respect to time: 
df- 

— = Rp + Rp = RR T r + Rp 

Now p is not constant, so its differential is not zero; hence rewriting this last 
equations we have that 

The instantaneous velocity of P in the fixed frame is 

dr 

di = Rp+tJxr _ 

The second term of course, is the contribution from the rotating frame which we 
saw previously. The first is the linear velocity measured in the rotating frame p, 
referred to the fixed frame (via the rotation matrix R which aligns the two frames) 
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3.5.2 Rotation 3: Instantaneous acceleration 

Our previous result is a general one relating the time derivatives of any vector in 
rotating and non-rotating frames. Let us now consider the second differential: 

f = t^xr + tL>xf + Rp + Rp 

We shall assume that the angular acceleration is zero, which kills off the first term, 
and so now, substituting for r we have 

r = w x (w x r + Rp) + Rp + Rp 
= wx(wxr) + wxRp + Rp + Rp 
= wx(wxr) + wxRp + R(R n R)p + Rp 
= cj x {at x r) + 2cj x (Rp) + Rp 


The instantaneous acceleration is therefore 
r = Rp + 2lj x (Rp) + lj x (lj x r) 


• The first term is the acceleration of the point P in the rotating frame mea¬ 
sured in the rotating frame, but referred to the fixed frame by the rotation 

R 

• The last term is the centripetal acceleration to due to the rotation. (Yes! Its 
magnitude is u 2 r and its direction is that of —r. Check it out.) 
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• The middle term is an extra term which arises because of the velocity of P 
in the rotating frame. It is known as the Coriolis acceleration, named after 
the French engineer who first identified it. 

Because of the rotation of the earth, the Coriolis acceleration is of great im¬ 
portance in meteorology and accounts for the occurrence of high pressure anti¬ 
cyclones and low pressure cyclones in the northern hemisphere, in which the Coriolis 
acceleration is produced by a pressure gradient. It is also a very important compo¬ 
nent of the acceleration (hence the force exerted) by a rapidly moving robot arm, 
whose links whirl rapidly about rotary joints. 

X Example 

Q Find the instantaneous acceleration of a projectile fired along a line of longi¬ 
tude (with angular velocity of 7 constant relative to the sphere) if the sphere 
is rotating with angular velocity lj. 


A Consider a coordinate frame defined by mutually orthogonal unit vectors, 
l, m and n, as shown in Fig. 3.2. We shall assume, without loss of generality, 
that the fixed and rotating frames are instantaneously aligned at the moment 
shown in the diagram, so that R = I, the identity, and hence r = p. 

In the rotating frame 

p = 7 xp and p ='y x p ='y x ('y x p) 

So the in the fixed reference frame, because these two frames are instanta¬ 
neously aligned 

r = 7 x (7 x p) + 2u x (7 x p) + lj x (u x r) . 

The first term is the centripetal acceleration due to the projectile moving 
around the sphere — which it does because of the gravitational force. The 
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last term is the centripetal acceleration resulting from the rotation of the 
sphere. The middle term is the Coriolis acceleration. 

Using Fig. 3 . 2 , at some instant t 

r( t) = p(t) = r cos(7t)m + rsin(7t)n 
and 

7 = 7^ 

Then 

7 x (7 x p) = (7 • p)7 - 7 2 p = —7 2 p = -7 2 r, 

Check the direction — the negative sign means it points to wards the centre 
of the sphere, which is as expected. 

Likewise the last term can be obtained as 
u x (lj x r) = — u 2 r sin(7t)n 

Note that it is perpendicular to the axis of rotation m, and because of the 
minus sign, directed towards the axis) 

The Coriolis term is derived as: 

2 wxp = 2 u x (7 x p) 



'O' 

/ 

"7" 


0 

\ 

2 

u 

x 

0 

X 

r cos 7 1 



_0_ 

V 

_0_ 


r sin7t 

/ 


= 2u'yr cos'ytl 


Instead of a projectile, now consider a rocket on rails which stretch north 
from the equator. As the rocket travels north it experiences the Coriolis force 
(exerted by the rails): 

2 7 uj Rcos^yt l 

+ve -ve +ve +ve 

Hence the coriolis force is in the direction opposed to i (i.e. in the opposite 
direction to the earth's rotation). In the absence of the rails (or atmosphere) 
the rocket's tangetial speed (relative to the surface of the earth) is greater 
than the speed of the surface of the earth underneath it (since the radius 
of successive lines of latitude decreases) so it would (to an observer on the 
earth) appear to deflect to the east. The rails provide a coriolis force keeping 
it on the same meridian. 
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Rocket’s velocity in direction of meridian 



Tangential component of velocity 



(NB instantaneously common to earth’s surface and rocket) 


Figure 3.3: Rocket example 
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Figure 3.4: Coriolis effect giving rise to weather systems 
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Lecture 4 


Line, Surface and Volume Integrals. 
Curvilinear coordinates. 


We started off the course being concerned with individual vectors a, b, c, and so 
on. 

We went on to consider how single vectors vary over time or over some other 
parameter such as arc length. 

In much of the rest of the course, we will be concerned with scalars and vectors 
which are defined over regions in space — scalar and vector fields 

In this lecture we introduce line, surface and volume integrals, and consider how 
these are defined in non-Cartesian, curvilinear coordinates 


4.1 Scalar and vector fields 

When a scalar function u( r) is determined or defined at each position r in some 
region, we say that u is a scalar field in that region. 

Similarly, if a vector function v(r) is defined at each point, then v is a vector field 
in that region. As you will see, in field theory our aim is to derive statements about 
the bulk properties of scalar and vector fields, rather than to deal with individual 
scalars or vectors. 

Familiar examples of each are shown in figure 4.1. 

In Lecture 1 we worked out the force F(r) on a charge Q arising from a number 
of charges g,. The electric field is F/Q, so 


N 




For example; you could work out the velocity field, in plane polars, at any point on 
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(a) (b) 

Figure 4.1: Examples of (a) a scalar field (pressure); (b) avector field (wind velocity) 

a wheel spinning about its axis 

v(r) = u x r 

or the fluid flow field around a wing. 

If the fields are independent of time, they are said to be steady. Of course, most 
vector fields of practical interest in engineering science are not steady, and some 
are unpredictable. 

Let us first consider how to perform a variety of types of integration in vector and 
scalar fields. 


4.2 Line integrals through fields 

Line integrals are concerned with measuring the integrated interaction with a field 
as you move through it on some defined path. Eg, given a map showing the 
pollution density field in Oxford, you may wish to work out how much pollution 
you breathe in when cycling from college to the Department via different routes. 

First recall the definition of an integral for a scalar function f(x) of a single scalar 
variable x. One assumes a set of n samples f, = f(x,) spaced by <5x,. One forms 
the limit of the sum of the products f(x,)5x / as the number of samples tends to 
infinity 

f n 

/ f(x)dx = lim ^ fjdxj . 

■’ n — > oc /=i 

5xj -v 0 

For a smooth function, it is irrelevant how the function is subdivided. 

4.2.1 Vector line integrals 

In a vector line integral, the path L along which the integral is to be evaluated 
is split into a large number of vector segments 5r,-. Each line segment is then 
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Figure 4.2: Line integral. In the diagram F(r) is a vector field, but it could be replaced with scalar 
field U( r). 


multiplied by the quantity associated with that point in space, the products are 
then summed and the limit taken as the lengths of the segments tend to zero. 

There are three types of integral we have to think about, depending on the nature 
of the product: 

1. Integrand U( r) is a scalar field, hence the integral is a vector. 


lim V Uj5ri 

5r,—>0 t-r* 


I = J^U(r)dr 

2. Integrand a(r) is a vector field dotted with dr hence the integral is a scalar: 

/ = / a(r) - dr = lim y a, • 5r,. 

Jl \ sr^o y J 

3. Integrand a(r) is a vector field crossed with dr hence vector result. 


I = J a(r) x dr 


lim y a, x <5r, 

5r,—>0 


Note immediately that unlike an integral in a single scalar variable, there are many 
paths L from start point to end point re, and the integral will in general depend 
on the path taken. 


Physical examples of line integrals 

• The total work done by a force F as it moves a point from A to B along 
a given path C is given by a line integral of type 2 above. If the force acts 
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at point r and the instantaneous displacement along curve C is dr then the 
infinitessimal work done is dW = F.dr, and so the total work done traversing 
the path is 

W c = J F.dr 

• Ampere's law relating magnetic field B to linked current can be written as 

j) B.dr = iiqI 

where I is the current enclosed by (closed) path C. 

• The force on an element of wire carrying current /, placed in a magnetic field 
of strength B, is dF = I dr x B. So if a loop this wire C is placed in the field 
then the total force will be and integral of type 3 above: 

F = /|drxB 

Note that the expressions above are beautifully compact in vector notation, and are 
all independent of coordinate system. Of course when evaluating them we need 
to choose a coordinate system: often this is the standard Cartesian coordinate 
system (as in the worked examples below), but need not be, as we shall see in 
section 4.6. 


X Examples 

Q1 An example in the xy-plane. A force F = x 2 yl + xy 2 ] acts on a body as it 
moves between (0, 0) and (1,1). 

Determine the work done when the path is 

1. along the line y = x. 

2. along the curve y = x 11 , n > 0. 

3. along the x axis to the point (1, 0) and then along the line x = 1. 


A1 Th is is an example of the “type 2” line integral. In planar Cartesians, dr = 
1dx+]dy. Then the work done is 


J F ■ dr = J ( x 2 ydx + xy 2 dy) 


1. For the path y = x we find that dy = dx. So it is easiest to convert al 
y references to x. 


r( T 1 ) rx= 1 rx= 1 

/ (x 2 ydx+xy 2 dy) = / ( x 2 xdx+xx 2 dx ) = / 2 x 3 dx = [x A /2\ X ~ =Q = 1/2 . 

2(0,0) Jx =0 Jx =0 
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Figure 4.3: Line integral taken along three difference paths 


2. For the path y = x n we find that dy = nx 11 1 dx, so again it is easiest to 
convert all y references to x. 


•(i.i) 


(o,o) 


(. x 2 ydx + xy 2 dy) = 


f (x n+2 dx + nx n ~ l .x.x 2n dx) 

Jx =0 
r x = i 

/ (x n+2 dx + nx 3n dx ) 

Jx =0 


1 n 

+ 


n + 3 3/7+1 


3. This path is not smooth, so break it into two. Along the first section, 
y = 0 and dy = 0, and on the second x = 1 and dx = 0, so 


rB px=l ry=l 

/ (x 2 ydx + xy 2 dy) = / (x 2 0 dx)+ ly 2 dy = 0+ [y 3 /3 

JA J x =0 J y =0 


3 /O 'y=0 “ 1/3 ' 


So in general the integral depends on the path taken. Notice that answer (1) 
is the same as answer (2) when n = 1, and that answer (3) is the limiting 
value of answer (2) as n -* oo. 


Q2 Repeat part (2) using the Force F = xy 2 / + x 2 yj. 
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A2 For the path y = x n we find that dy = nx 11 1 dx, so 


r( i.i) 

/ (y 2 xdx + yx 2 dy) 
J{ o,o) 


f (x 2n+1 dx+ nx n ~ 1 .x 2 .x n dx) 
Jx =0 
r x=1 

/ (x 2n+1 c/x + /?x 2 " +1 c/x) 

Jx=0 


1 x=0 

1 n 

+ 


2/7 + 2 2/? + 2 

1 

- independent of n 


4.3 Line integrals in Conservative fields 

In the second example, the line integral has the same value for the whole range 
of paths. In fact it is wholly independent of path. This is easy to see if we write 
g(x,y) = x 2 y 2 /2. Then using the definition of the perfect differential 

dg ,dg 

dg = —dx+—dy 
ox oy 

we find that 

r B r B 

/ ( y 2 xdx + yx 2 dy) = dg 
J A J A 

= g&- gA 

which depends solely on the value of g at the start and end points, and not at all 
on the path used to get from A to B. Such a vector field is called conservative. 

One sort of line integral performs the integration around a complete loop and is 
denoted with a ring. If E is a conservative field, determine the value of 


E-dr . 


In electrostatics, if E is the electric field then the potential function is 


0 


E-dr . 


Do you think E is conservative? 
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4.3.1 A note on line integrals defined in terms of arc length 

Line integrals are often defined in terms of scalar arc length. They don't appear 
to involve vectors (but actually they are another form of type 2 defined earlier). 

The integrals usually appear as follows 



J L 

and most often the path L is along a curve defined parametrically as x = x(p), 
y = y(p), z = z(p) where p is some parameter. Convert the function to F(p), 
writing 



where 



Note that the parameter p could be arc-length s itself, in which case ds/dp = 1 
of course! Another possibility is that the parameter p is x — that is we are told 
y = y{x) and z = z(x). Then 



4.4 Surface integrals 

These can be defined by analogy with line integrals. 

The surface S over which the integral is to be evaluated is now divided into in¬ 
finitesimal vector elements of area d S, the direction of the vector d S representing 
the direction of the surface normal and its magnitude representing the area of the 
element. 

Again there are three possibilities: 

. j s uds — scalar field U] vector integral. 

• Js a ' dS — vector field a; scalar integral. 

• f s a x d S — vector field a; vector integral. 

(in addition, of course, to the purely scalar form, f s UdS ). 
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Physical example of surface integral 

• Physical examples of surface integrals with vectors often involve the idea of 
flux of a vector field through a surface, f s a.dS For example the mass of fluid 
crossing a surface S in time dt is dM = pv.dSdt where p(r) is the fluid 
density and v(r) is the fluid velocity. The total mass flux can be expressed as 
a surface integral: 

0/W = f s p ( r ^^' dS 


Again, though this expression is coordinate free, we evaluate an example below 
using Cartesians. Note, however, that in some problems, symmetry may lead us 
to a different more natural coordinate system. 


X Example 

Evaluate f F ■ dS over the x = 1 side of 
the cube shown in the figure when F = 
y? + zj + xk. 

d S is perpendicular to the surface, its ± 
direction actually depends on the nature 
of the problem. More often than not, 
the surface will enclose a volume, and the 
surface direction is taken as everywhere 
emanating from the interior. 

Hence for the x = 1 face of the cube 

dS = dydzl 


and 



F • dS 



4.5 Volume integrals 

The definition of the volume integral is again taken as the limit of a sum of products 
as the size of the volume element tends to zero. One obvious difference though is 
that the element of volume is a scalar (how could you define a direction with an 
infinitesimal volume element?). The possibilities are: 
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4.6. CHANGING VARIABLES: CURVILINEAR COORDINATES 


• f v U(r)dV — scalar field; scalar integral. 

• f v adV — vector field; vector integral. 

You have covered these (more or less) in your first year course, so not much more 
to say here. The next section considers these again in the context of a change of 
coordinates. 


4.6 Changing variables: curvilinear coordinates 

Up to now we have been concerned with Cartesian coordinates x,y,z with coor¬ 
dinate axes 1,j, k. When performing a line integral in Cartesian coordinates, you 
write 

A /V 

r = xl + y] + zk and dr = dxl + dy] + dzk 

and can be sure that length scales are properly handled because - as we saw in 
Lecture 3 - 

| dr| = ds — \Jdx 2 + dy 2 + dz 2 . 

The reason for using the basis ?, j, k rather than any other orthonormal basis set is 
that 1 represents a direction in which x is increasing while the other two coordinates 
remain constant (and likewise for J and k with y and z respectively), simplifying 
the representation and resulting mathematics. 

Often the symmetry of the problem strongly hints at using another coordinate 
system: 

• likely to be plane, cylindrical, or spherical polars, 

• but can be something more exotic 

The general name for any different “u, v, w" coordinate system is a curvilinear 
coordinate system. We will see that the idea hinted at above - of defining a 
basis set by considering directions in which only one coordinate is (instantaneously) 
increasing - provides the approriate generalisation. 

We begin by discussing common special cases: cylindrical polars and spherical 
polars, and conclude with a more general formulation. 

4.6.1 Cylindrical polar coordinates 

As shown in figure 4.4 a point in space P having cartesian coordinates x, y, z can 
be expressed in terms of cylindrical polar coordinates, r, 0, z as follows: 

r = xl + yj + zk 

= r cos 0? + rsin <fij + zk 
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Figure 4.4: Cylindrical polars: (a) coordinate definition; (b) “iso” lines in r, 0 and z. 

Note that, by definition, represents a direction in which (instantaneously) r is 
changing while the other two coordinates stay constant. That is, it is tangent to 
lines of constant 0 and z. Likewise for and Thus the vectors; 


e r 


^0 


e z 


dr 

dr 

dr 

dr 

dz 


cos 0? + sin 0/ 

— r sin 0? + r cos 0y 


A 

k 


Aside on notation: some texts 
use the notation r, 0,... to rep¬ 
resent the unit vectors that form 
the local basis set. Though I pre¬ 
fer the notation used here, where 
the basis vectors are written as 
e with appropriate subscripts (as 
used in Riley et al ), you should be 
aware of, and comfortable with, 
either possibility. 


form a basis set in which we may describe infinitessimal vector displacements in 
the position of P, dr. It is more usual, however, first to normalise the vectors to 
obtain their corresponding unit vectors, e r , e^, e z . Following the usual rules of 
calculus we may write: 


dr 


dr , dr ,, dr , 

^~ dr + + ^~ dz 
dr 50 5z 


dre r + r/0e^ + dze z 
dre r + rr/0e0 + dze z 


Now here is the important thing to note. In cartesian coordinates, a small change 
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4.6. CHANGING VARIABLES: CURVILINEAR COORDINATES 


in (eg) x while keeping y and z constant would result in a displacement of 

ds = | c/r | = V dr.dr = \J dx 2 + 0 + 0 = dx 

But in cylindrical polars, a small change in 0 of c/0 while keeping r and z constant 
results in a displacement of 


ds = | c/r | = i/ r 2 (c/0) 2 = rc/0 


Thus the size of the (infinitessimal) displacement is dependent on the value of r. 
Factors such as this r are known as scale factors or metric coefficients, and we 
must be careful to take them into account when, eg, performing line, surface or 
volume integrals, as you will below. For cylindrical polars the metric coefficients 
are clearly 1 , r and 1 . 

Example: line integral in cylindrical coordinates 

Q Evaluate § c a • c/I , where a = x 3 ] — y 3 1 + x 2 yk and C is the circle of radius r 
in the z = 0 plane, centred on the origin. 

A Consider figure 4.5. In this case our cylindrical coordinates effectively reduce 
to plane polars since the path of integration is a circle in the z = 0 plane, but 
let's persist with the full set of coordinates anyway; the k component of a 
will play no role (it is normal to the path of integration and therefore cancels 
as seen below). 

On the circle of interest 


a = r 3 (— sin 3 0? + cos 3 0J + cos 2 0 sin 0/c) 
and (since dz = dr = 0 on the path) 


c/r = r c/0 e,/, 

= rc/0(— sin 0/ + cos 0y) 


so that 



since 
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Figure 4.5: Line integral example in cylindrical coordinates 


Volume integrals in cylindrical polars 

In Cartesian coordinates a volume element is given by (see figure 4.6a): 
dV = dxdydz 

Recall that the volume of a parallelopiped is given by the scalar triple product of 
the vectors which define it (see section 2.1.2). Thus the formula above can be 
derived (even though it is “obvious") as: 

dV = dx1.(dyj x dzk) = dxdydz 
since the basis set is orthonormal. 

In cylindrical polars a volume element is given by (see figure 4.6b): 
dV = dre r .(rd(pe(p x dze z ) = rdcfrdrdz 

Note also that this volume, because it is a scalar triple product, can be written as 
a determinant: 



e r dr 


e r dr 

dV = 

e^rd# 

= 

e^r/0 


e z dz 


e z dz 


dx dy dz 

dr dr dr 

dx dy dz 

dcl> d(j) dtp 

dx dy dz 

dz dz dz 


drdcj)dz 


where the equality on the right-hand side follows from the definitions of e r = |^ = 

§ 7 * + |jj + etc. This is the explanation for the “magical” appearance of the 

determinant in change-of-variables integration that you encountered in your first 
year maths! 
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(a) (b) 

Figure 4.6: Volume elements dV in (a) Cartesian coordinates; (b) Cylindrical polar coordinates 


Surface integrals in cylindrical polars 


Recall from section 4.4 that for a surface element with normal along 1 we have: 
d S = dydzl 

More explicitly this comes from finding normal to the plane that is tangent to the 
surface of constant x and from finding the area of an infinitessimal area element 
on the plane. In this case the plane is spanned by the vectors j and k and the area 
of the element given by (see section 1.3): 


dS = 


dy] x dzk 


Thus 


d S = dyj x dzk = IdS = dydzl 


In cylindrical polars, surface area elements (see figure 4.7) are given by: 

d S = dre r x rdc^e^ = rdrd(pe z (for surfaces of constant z) 

d S = rr/ 0 e 0 x dze z = rd<fidze r (for surfaces of constant r) 


Similarly we can find d S for surfaces of constant 0, though since these aren’t as 
common this is left as a (relatively easy) exercise. 
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d S z = rdrd(pe z 



Figure 4.7: Surface elements in cylindrical polar coordinates 

4.6.2 Spherical polars 

Much of the development for spherical polars is similar to that for cylindrical polars. 
As shown in figure 4.6.2 a point in space P having cartesian coordinates x,y,z 
can be expressed in terms of spherical polar coordinates, r, 6, 0 as follows: 

r = xl + yj + zk 

= r sin 6 cos 07+ r sin 6 sin 0y + r cos 6k 


The basis set in spherical polars is obtained in an analogous fashion: we find unit 
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4.6. CHANGING VARIABLES: CURVILINEAR COORDINATES 


vectors which are in the direction of increase of each coordinate: 


dr 

e r = — — sin 6 cos 0/ + sin 6 sin 0? + cos 6k = e r 
or 

dr 


e e = — = r cos 0 cos (pi 4- r cos (9 sin <pj — rsin 9k = reg 

p 


d6 

dr 


d(p 


— rsin 0sin 0/+ rsin#cos0/ =rsin@e q 


As with cylindrical polars, it is easily verified that the vectors e r , eg, e# form an 
orthonormal basis. 

A small displacement dr is given by: 


dr dr dr 

dr = T dr + m de + 

= dre r + d6ee 
= dre r + rdOe , 


d(p£(p 

- rsin 0 r/ 0 e 0 


Thus the metric coefficients are 1, r, rsin 6. 


Volume integrals in spherical polars 

In spherical polars a volume element is given by (see figure 4.8): 
dV = dre r .(rd6ee x rsin0c/0e^,) = r 2 s\r\6drdOdcp 

Note again that this volume could be written as a determinant, but this is left as 
an exercise. 


Surface integrals in spherical polars 

The most (the only?) useful surface elements in spherical polars are those tangent 
to surfaces of constant r (see figure 4.9). The surface direction (unnormalised) is 
given by eg x = e r and the area of an infinitessimal surface element is given by 
|rdOeg x rsin Odcpe^l = r 2 sin OdOdcp. 

Thus a surface element d S in spherical polars is given by 
d S = rdOeg x r sin Odcpe$ = r 2 sin 9e r 
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& Example: surface integral in spherical polars 


Q Evaluate f s a ■ dS, where a = z 3 k 
and S is the sphere of radius A cen¬ 
tred on the origin. 


A On the surface of the sphere: 
a = A 3 cos 3 6k dS = A 2 sin 6 d6 c/0e r 
Hence 


a • d S 


/*27T /»7T 

*0=0 J9=0 
■>2i r 


A 3 cos 3 6 A 2 sin 0 [e r • &] dOdcfr 


rllX />7T 

A 5 c/0 / cos 3 0 sin 0[cos (9] d6 


JO Jo 

27T/4 5 -jr [— COS 5 0] 


7T 

0 


47T/4 5 

5 
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Figure 4.9: Surface element d S in spherical polar coordinates 

4.6.3 General curvilinear coordinates 

Cylindrical and spherical polar coordinates are two (useful) examples of general 
curvilinear coordinates. In general a point P with Cartesian coordinates x, y, z can 
be expressed in terms of the curvilinear coordinates u, v, w where 


x = x(u,v,w), y = y(u, v, w), z = z(u,v,w ) 


Thus 


and 


r = x(u, v, w)1 + y(u, v, w)j + z(u, v, w)k 


dr dx^ dy „ dz ? 

~— = ~—/ T t —J T t —k 

du du du du 


and similarly for partials with respect to v and w, so 

dr 


. dr dr 
dr = —du + —dv 
du dv 


dw 


dw 


We now define the local coordinate system as before by considering the directions 
in which each coordinate “unilaterally” (and instantaneously) increases: 


- 1 / 1 / 


dr 

du 

dr 

dv 

dr 

dw 


dr 


du ~ u 

dr 


dv 

dr 


dw 


G u — h u G u 
— h v G v 

6 - tlyyGw 
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LECTURE 4. LINE, SURFACE AND VOLUME INTEGRALS. CURVILINEAR COORDINATES. 

The metric coefficients are therefore h u = ||^|, h v = ||£| and h w = ||^|. 

A volume element is in general given by 
dV = h Ll d ue u .(h v dve v x h w dwe w ) 

and simplifies if the coordinate system is orthonormal (since e u .(e v x e w ) = 1) to 
dV = h u h v h w d udvdw 

A surface element (normal to constant w, say) is in general 
d S = h u due u x h v dve v 

and simplifies if the coordinate system is orthogonal to 
d S = h u h v dudve w 


4.6.4 Summary 


To summarise: 


x = 

r = 

hu = 


u 

dr 

dV 

dS 


General curvilinear coordinates 

x(u,v,w), y = y(u,v,w), z = z(u,v,w) 
x(u, v, w)l + y(u, v, w)j + z(u, v, w)k 


dr 


du 


h v = 


dr 


e t/ 


dv 

v = 


h w — 

1 dr 


dr 

dw 


1 dr 

h~du' ' ^ h v dv’ 

h u duu + h v dvv + h w dww 

h u h v h w dudvdw u.(v x w) 


w 


- 1 / 1 / 


1 dr 
h w dw 


= h u h v dudv u x v (for surface element tangent to constant w) 


Plane polar coordinates 

x = rcosO, y = r sin 6 
r = r cos 61 + r sin OJ 
h r = 1, h e = r 

e r = cos91 + s\r\6j, ee = — sin 91 + cos 6] 
dr = dre r + rd9ee 
dS = rdrd9k 
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Cylindrical polar coordinates 


x = rcos0, y = r sin0, z = z 

a 

r = r cos 07 + rsin 0J + z/c 

h r = 1, h^ = r, h z = l 

XV 

e r = cos 07 +sin Qj, e# = — sin 07 + cos 0j, e z = k 
dr = dre r + rdc^e# + c/ze z 

A 

c/S = rdrd(j)k (on the flat ends) 
c/S = rd(pdze r (on the curved sides) 
dV = rdrdtpdz 


Spherical polar coordinates 

x = rsin0cos0, y = r sin (9sin 0, z=rcos6 
x = r sin 6 cos 07 + rsin 0 sin 0y + r cos 
h r — 1, he = r, /?0 = rsin@ 
e r = sin 0 cos 07 + sin 0 sin 0y + cos 6 k 
e<9 = cos0 cos07 + cos# sin 0y + sin 
e^ = — sin 07+cos 0y 
dr = dre r + rdOee + rsin Od^e^ 
d S = r 2 sin 6drd6d(pe r (on a spherical surface) 
dV = r 2 s\r\ 6drd6d0 
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Lecture 5 


Vector Operators: Grad, Div and Curl 


In the first lecture of the second part of this course we move more to consider 
properties of fields. We introduce three field operators which reveal interesting 
collective field properties, viz. 

• the gradient of a scalar field, 

• the divergence of a vector field, and 

• the curl of a vector field. 

There are two points to get over about each: 

• The mechanics of taking the grad, div or curl, for which you will need to brush 
up your multivariate calculus. 

• The underlying physical meaning — that is, why they are worth bothering 
about. 

In Lecture 6 we will look at combining these vector operators. 


5.1 The gradient of a scalar field 

Recall the discussion of temperature distribution throughout a room in the overview, 
where we wondered how a scalar would vary as we moved off in an arbitrary direc¬ 
tion. Here we find out how. 

If U(x,y,z ) is a scalar field, ie a scalar function of position r = [x,y,z] in 3 
dimensions, then its gradient at any point is defined in Cartesian co-ordinates by 

... dU„ ^ dU„ dU, 

grad(7 = —/ + —J + y-k . 
ox oy oz 

It is usual to define the vector operator which is called “del” or “nabla” 

, - d t d 
v “'at + J T V + k Tz' 
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Then 

grad U = VU . 

Note immediately that VU is a vector field! 

Without thinking too carefully about it, we can see that the gradient of a scalar 
field tends to point in the direction of greatest change of the field. Later we will 
be more precise. 

X Worked examples of gradient evaluation 

1. U = x 2 


vu 


—i + —1 + —k x 2 = 2 xi 
dx dy J dz J 


2. U = r : 


r 

VU 


x 2 + y 2 + z 2 


/a. a. a T \ , 2 

“ (dx' + Ty J + Tz k ) (X 
= 2x1 + 2yj + 2zk = 2 r . 


y 2 + z 2 ) 


3. U = c • r, where c is constant. 


va 


LL\ 


1 + J + k — J (CiX + C 2 j/ + C 3 Z) — C 1 /+C 2 J+C 3 /C — c 


4. U = f(r), where r = yj (x 2 + y 2 + z 2 ) 

U is a function of r alone so df/dr exists. As U = f (x, y, z) also, 

df df dr df df dr df df dr 

dx drdx dy drdy dz drdz 

df ( dr ^ dr ^ dr 


_ lf af, a/r df ? 
va — — 1 + —j + — k 

ax ay az 


1 « ■/ T ~—j T —k 
dr \dx dy dz 


But r = \J x 2 + y 2 + z 2 , so dr/dx = x/r and similarly for y, z. 
df ( x? + yj + zk\ df /r 


va 


c/r 


c/r Vr 


417 











5.2. THE SIGNIFICANCE OF GRAD 



Figure 5.1: The directional derivative 


5.2 The significance of grad 

If our current position is r in some scalar field U (Fig. 5.1), and we move an 
infinitesimal distance dr, we know that the change in U is 


dU dU dU 
dU = —dx + — dy + — dz . 


dx 


dy 


dz 


But we know that dr = (jdx+jdy + kdz ) and VU = (IdU/dx + JdU/dy 
kdU/dz), so that the change in U is also given by the scalar product 


dU = VU ■ dr . 


Now divide both sides by ds 


dU 

ds 


VU- 


dr 

ds 


But remember that \dr\ = ds, so dr/ds is a unit vector in the direction of dr. 

Thi s result can be paraphrased as: _ 

• gradtf has the property that the rate of change of U wrt distance in a 
particular direction (d) is the projection of gradtl onto that direction 
(or the component of grad U in that direction). 

The quantity dU/ds is called a directional derivative, but note that in general it 
has a different value for each direction, and so has no meaning until you specify 
the direction. 

We could also say that 
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• At any point P, gradL points in the direction of greatest change of 
U at P, and has magnitude equal to the rate of change of U wrt 
distance in that direction. 



Another nice property emerges if we think of a surface of constant U - that is the 
locus (x, y, z) for 


U(x, y, z) = constant . 


If we move a tiny amount within that iso -U surface, there is no change in U, so 
dU/ds = 0. So for any dr/ds in the surface 




0 . 


But dr/ds is a tangent to the surface, so this result shows that 

• grad U is everywhere NORMAL to a surface of constant U. 




Surface of constant U Surface of constant U 

These are called Level Surfaces 
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5.3. THE DIVERGENCE OF A VECTOR FIELD 


5.3 The divergence of a vector field 

The divergence computes a scalar quantity from a vector field by differentiation. 
If a(x, y, z) is a vector function of position in 3 dimensions, that is a = a{i + 3zJ + 
a 3 k, then its divergence at any point is defined in Cartesian co-ordinates by 


da i 

diva = —- 

ox 


da 2 

dy 


da 3 

dz 


We can write this in a simplified notation using a scalar product with the V vector 
differential operator: 


L d .a T d\ 

diva = / --h j- -b k — • a = V • a 

\ dx dy dz J 

Notice that the divergence of a vector field is a scalar field. 


X Examples of divergence evaluation 

a diva 

1) x? 1 

2) r(= xl + yj + zk) 3 

3) r/r 3 0 

4) rc, for c constant (r • c )/r 

We work through example 3). 

The x component of r/r 3 is x.(x 2 + y 2 + z 2 )“ 3 / 2 , and we need to find d/dx of it. 

|-x.(x 2 + y 2 + z 2 )- 3/2 = l.(x 2 + y 2 + z 2 )“ 3/2 + x—?(x 2 + y 2 + z 2 )“ 5/2 .2x 
dx 2 

= r -3 (l — 3x 2 r -2 ) . 

The terms in y and z are similar, so that 

div(r/r 3 ) = r -3 (3 - 3(x 2 + y 2 + z 2 )r -2 ) = r -3 (3 - 3) 

= 0 


5.4 The significance of div 

Consider a typical vector field, water flow, and denote it by a(r). This vector has 
magnitude equal to the mass of water crossing a unit area perpendicular to the 
direction of a per unit time. 

Now take an infinitesimal volume element dV and figure out the balance of the 
flow of a in and out of dV. 
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To be specific, consider the volume element dV = dxdydz in Cartesian co¬ 
ordinates, and think first about the face of area dxdz perpendicular to the y axis 
and facing outwards in the negative y direction. (That is, the one with surface 
area d S = — dxdz ].) 


A 



dS = -dxdz j 


Figure 5.2: Elemental volume for calculating divergence. 


The component of the vector a normal to this face is a • j = a y , and is pointing 
inwards, and so its contribution to the OUTWARD flux from this surface is 

a • dS = — a y {y)dzdx , 

where a y (y) means that a y is a function of y. (By the way, flux here denotes mass 
per unit time.) 

A similar contribution, but of opposite sign, will arise from the opposite face, but 
we must remember that we have moved along y by an amount dy, so that this 
OUTWARD amount is 



The total outward amount from these two faces is 


—^dydxdz = —^dV 
dy dy 


Summing the other faces gives a total outward flux of 



So we see that 
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5.5. THE LAPLACIAN: DIV(GRAD U) OF A SCALAR FIELD 


The divergence of a vector field represents the flux generation per unit 
volume at each point of the field. (Divergence because it is an efflux not 
an influx.) 

Interestingly we also saw that the total efflux from the infinitesimal volume was 
equal to the flux integrated over the surface of the volume. 

(NB: The above does not constitute a rigorous proof of the assertion because we 
have not proved that the quantity calculated is independent of the co-ordinate 
system used, but it will suffice for our purposes.) 

5.5 The Laplacian: div(gradU) of a scalar field 

Recall that grad U of any scalar field U is a vector field. Recall also that we 
can compute the divergence of any vector field. So we can certainly compute 
div(grad(7), even if we don’t know what it means yet. 

Here is where the V operator starts to be really handy. 



v- (vio 


u 


This last expression occurs frequently in engineering science (you will meet it next 
in solving Laplace's Equation in partial differential equations). For this reason, the 
operator V 2 is called the "Laplacian” 



Laplace’s equation itself is 
V 2 U = 0 
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£ Examples of V 2 U evaluation 



U 

V 2 U 

1) 

r 2 (= x 2 + y 2 + z 2 ) 

6 

2) 

xy 2 z 3 

2xz 3 + 6xy 2 z 

3) 

1/r 

0 


Let’s prove example (3) (which is particularly significant - can you guess why?). 


1/r — (x 2 + y 2 + z 2 )“ 1/2 


a a 2 

— — (X 

dx dx y 


+ y z + Z z ) 


2 X 1/2 


|--x.(x 2 + y 2 + z 2 r 3 / 2 

OX 

-(x 2 + y 2 + z 2 )- 3/2 + 3x.x.(x 2 + y 2 + z 2 )~ 5/2 
(1/r 3 )( —1 + 3x 2 /r 2 ) 


Adding up similar terms for y and z 




(x 2 + y 2 + x 2 )\ 
r 2 ) 


= 0 


5.6 The curl of a vector field 

So far we have seen the operator V applied to a scalar field Vf7; and dotted with 
a vector field V • a. 

We are now overwhelmed by an irrestible temptation to 
• cross it with a vector field V x a 


This gives the curl of a vector field 

V x a = curl(a) 

We can follow the pseudo-determinant recipe for vector products, so that 


V x a = 


/\ A 

1 J 

k 

d d 

d 

dx dy 

dz 

3 X 3y 

3 Z 

da z 

day 

dy 

dz 


(remember it this way) 


/ + 


'da x 

dz 


da z \ 
dx ) 


J + 


da v da. 


dx dy 
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Examples of curl evaluation 

a V x a 


1) -yl + xj 2 k 

2) x 2 y 2 k 2x 2 yl - 2xy 2 j 


5.7 The significance of curl 

Perhaps the first example gives a clue. The field a = —y? + xj is sketched in 
Figure 5.3(a). (It is the field you would calculate as the velocity field of an object 

A 

rotating with u = [0, 0,1].) This field has a curl of 2k, which is in the r-h screw 
sense out of the page. You can also see that a field like this must give a finite 
value to the line integral around the complete loop j> c a • dr. 



Figure 5.3: (a) A rough sketch of the vector field — yl + xj. (b) An element in which to calculate 
aid 


In fact curl is closely related to the line integral around a loop. 


The circulation of a vector a round any closed curve C is defined to be 
f c a-dr 

and the curl of the vector field a represents the vorticity, or circulation 

per unit area, of the field. 
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LECTURE 5. VECTOR OPERATORS: GRAD, DIV AND CURL 


Our proof uses the small rectangular element dx by dy shown in Figure 5.3(b). 
Consider the circulation round the perimeter of a rectangular element. 

The fields in the x direction at the bottom and top are 

d 3 

a x (y) and a x (y + dy) = a x (y) + — -dy, 

dy 


where a x (y) denotes a x is a function of y, and the fields in the y direction at the 
left and right are 


a y (x) 


and 


a y (x + dx) = a y (x) 


da y 

dx 


dx 


Starting at the bottom and working round in the anticlockwise sense, the four 
contributions to the circulation dC are therefore as follows, where the minus signs 
take account of the path being opposed to the field: 


dC = + [a x (y) dx] + [a y (x + dx) dy] - [a x (y + dy) dx] - [a y (x) dy] 

da x x 


= + [a x (y) dx] 
( da y da x N 


\dx dy j 
(V x a) ■ dS 


a y (x) 
dx dy 


da y 

dx 


dx dy 


a x (y) + ~Q^ dy J dx 


- [a y (x) dy] 


where d S = dxdyk. 

NB: Ag ain, this is not a completely rigorous proof as we have not shown that the 
result is independent of the co-ordinate system used. 


5.8 Some definitions involving div, curl and grad 

• A vector field with zero divergence is said to be solenoidal. 

• A vector field with zero curl is said to be irrotational. 

• A scalar field with zero gradient is said to be, er, constant. 


Revised Oct 2013 
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Lecture 6 


Vector Operator Identities 


In this lecture we look at more complicated identities involving vector operators. 
The main thing to appreciate it that the operators behave both as vectors and 
as differential operators, so that the usual rules of taking the derivative of, say, a 
product must be observed. 

There could be a cottage industry inventing vector identities. HLT contains a lot 
of them. So why not leave it at that? 

First, since grad, div and curl describe key aspects of vectors fields, they arise often 
in practice, and so the identities can save you a lot of time and hacking of partial 
derivatives, as we will see when we consider Maxwell’s equation as an example 
later. 

Secondly, they help to identify other practically important vector operators. So, 
although this material is a bit dry, the relevance of the identities should become 
clear later in other Engineering courses. 


6.1 Identity 1: curl grad U = 0 


/ j k 

V x VU = d/dx d/dy d/dz 
dU/dx dU/dy dUfdz 



0 


as d 2 /dydz = d 2 /dzdy. 

Note that the output is a null vector. 
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6.2 Identity 2: div curl a = 0 


V • V x a 


d/dx d/dy d/dz 
d/dx d/dy d/dz 
a x By a z 

d 2 a z d 2 a y d 2 a z d 2 a x d 2 a y d 2 a x 

dxdy dxdz dydx + dydz ^ dzdx dzdy 

0 


6.3 Identity 3: div and curl of Ua 

Suppose that U( r) is a scalar field and that a(r) is a vector field and we are inter¬ 
ested in the product U a. This is a vector field, so we can compute its divergence 
and curl. For example the density p(r) of a fluid is a scalar field, and the instan¬ 
taneous velocity of the fluid v(r) is a vector field, and we are probably interested 
in mass flow rates for which we will be interested in p(r)v(r). 

The divergence (a scalar) of the product U a is given by: 

V-(Ua) = U(V • a) + (VU) • a 
= (Vdiva + (gradf7) • a 

In a similar way, we can take the curl of the vector field U a, and the result should 
be a vector field: 

V x (Ua) = UV x a + (VU) x a . 


6.4 Identity 4: div of a x b 

Life quickly gets trickier when vector or scalar products are involved: For example, 
it is not that obvious that 

div(a x b) = curia • b — a • curlb 

To show this, use the determinant: 

d / dxj d / dxj d/dxk 
a x a y a z 
b x b y b z 

= ... bash out the products ... 

= curia • b — a • (curl b) 


d_ 

dx 


d 


d 


[a y b z - a z b y ] + — [ a z b x - a x b z \ + — [ a x b y - a y b x } 


dy 


dz 
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6.5. IDENTITY 5: CURL(A x B) 

6.5 Identity 5: curl(a x b) 


curl(a x b) = 


d/dx 


J 

d/dy 


k 

d/dz 


3 y b z - a z b y a z b x - a x b z a x b y - a y b x 


so the / component is 

Y (axby - a A ) - T( aA - a x b z ) 
which can be written as the sum of four terms: 

a 


db y db z 


dy dz 


-by 


da y da z 


a 


a 


dy dz 


dy 


dz 


dy 


a \ 


b v — + by— 1 a y ( a,,— + a 7 — b 


dz J 


Adding a x (db x /dx ) to the first of these, and subtracting it from the last, and 
doing the same with b x (da x /dx ) to the other two terms, we find that (you should 
of course check this): 

V x (a x b) = (V ■ b)a — (V ■ a)b + [b • V]a — [a • V]b 
where [a-V] can be regarded as new, and very useful, scalar differential operator. 


6.6 Definition of the operator [a V] 

This is a scalar operator, but it can obviously can be applied to a scalar field, 
resulting in a scalar field, or to a vector field resulting in a vector field: 


[a-V] 


d d d 

3 x~z -b 3y— —I- a z — 

dx dy dz 


6.7 Identity 6: curl(curla) for you to derive 

The following important identity is stated, and left as an exercise: 

curl(curla) = graddiva — V 2 a 
where 

V 2 a = V 2 a x ? + V 2 a y j + V 2 a z k 
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£ Example of Identity 6: electromagnetic waves 

Q: James Clerk Maxwell established a set of four vector equations which are fun¬ 
damental to working out how electromagnetic waves propagate. The entire 
telecommunications industry is built on these. 

divD = p 
divB = 0 

d 

curlE = ——B 
dt 

curIH = J + ^-D 
dt 


In addition, we can assume the following, which should all be familiar to you: 
B = /i r /x 0 H, J = crE, D = e r e 0 E, 
where all the scalars are constants. 

Now show that in a material with zero free charge density, p = 0, and with 
zero conductivity, a = 0, the electric field E must be a solution of the wave 
equation 

V 2 E = LL r LL 0 e r e 0 (d 2 E/dt 2 ) . 


A: First, a bit of respect. Imagine you are the first to do this — this is a tingle 
moment. 

divD = div(e r e 0 E) = e r e 0 divE = p = 0 =>■ divE = 0. (a) 

divB = div(/i ri u,oH) = /i, ri u,odivH = 0 =4> divB = 0 ( b ) 

curlE = —SB /dt = — i u, r /i, 0 (SH/St) (c) 

curIH = l + dD/dt = 0 + e r 6 0 (dE/dt) (d) 

But we know (or rather you worked out in Identity 6) that curlcurl = graddiv — 
V 2 , and using (c) 

curlcurlE = graddivE — V 2 E = curl (—/i r /i 0 (SH/St)) 
so interchanging the order of partial differentation, and using (a) divE = 0: 

-V 2 E = —MrMo ^“(curlH) 

d ( c*E\ 

= -^ 0 -^ eo -j 
s 2 e 

V 2 E = /i r /i 0 e r e0^2 
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6.8. GRAD, DIV, CURL AND V 2 IN CURVILINEAR CO-ORDINATE SYSTEMS 

This equation is actually three equations, one for each component: 

d 2 E 

V 72 r- U Z=x 

V E x = ii r ii Q e r e Q -^- 
and so on for E y and E z . 


6.8 Grad, div, curl and V 2 in curvilinear co-ordinate systems 

It is possible to obtain general expressions for grad, div and curl in any orthogonal 
curvilinear co-ordinate system by making use of the h factors which were introduced 
in Lecture 4. 

We recall that the unit vector in the direction of increasing u, with v and w being 
kept constant, is 

^ 1 dr 

U h Ll du 

where r is the position vector, and 


is the metric coefficient. Similar expressions apply for the other co-ordinate direc¬ 
tions. Then 


dr = h u udu + h v vdv + h w \Ndw . 


6.9 Grad in curvilinear coordinates 

Noting that U = U( r) and U = U(u, v, w), and using the properties of the gradient 
of a scalar field obtained previously 

dU dU dU 

VU ■ dr = dU = du + —— dv + 7 —dw 

du dv dw 

It follows that 


x dU dU dU 

VU ■ (h u udu + h v \/dv + h w wdw) = ——du + -^dv + ——dw 

du dv dw 


The only way this can be satisfied for independent du, dv, dw is when 


1 dU ^ 1 dU„ 1 dU 

VL/ — — u + — v + — —— 

hndu hu dv h w dw 


w 
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6.10 Divergence in curvilinear coordinates 

Expressions can be obtained for the divergence of a vector field in orthogonal 
curvilinear co-ordinates by making use of the flux property. 

We consider an element of volume dV. If the curvilinear coordinates are orthogonal 
then the little volume is a cuboid (to first order in small quantities) and 

dV = h u h v h w du dv dw . 

However, it is not quite a cuboid: the area of two opposite faces will differ as the 
scale parameters are functions of u, v and w in general. 



Figure 6.1: Elemental volume for calculating divergence in orthogonal curvilinear coordinates 
So the net efflux from the two faces in the v direction shown in Figure 6.1 is 


da „ 


dv 
d(a v h u h w ) 
dv 


dv 


h. 


dhu 

dv 


dv 


h 


dh 


W 


W 


dv 


dv 


dudw — a v h,jh w d udw 


dudvdw 


which is easily shown by multiplying the first line out and dropping second order 
terms (i.e. (dv) 2 ). 

By definition div is the net efflux per unit volume, so summing up the other faces: 


diva dV = 


diva h u h v h w dudvdw = 


' d(a u h v h w ) d(a v h u h w ) d(a w h u h v ) ' 
du dv dw 

d(a u h v h w ) , d(a v h u h w ) d(a w h u h v )' 


du 


dv 


dw 


dudvdw 

dudvdw 
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6.11. CURL IN CURVILINEAR COORDINATES 


So, finally, 


diva = 


'd{a u h v h w ) , d(a v h u h w ) , d{a w h u h v )' 


h u h v h w 


du 


dv 


dw 


6.11 Curl in curvilinear coordinates 

Recall from Lecture 5 that we computed the z component of curl as the circulation 
per unit area from 


dC 



da x \ 
dy ) 


dx dy 


By analogy with our derivation of divergence, you will realize that for an orthogonal 
curvilinear coordinate system we can write the area as h u h v dudw. But the opposite 
sides are no longer quite of the same length. The lower of the pair in Figure 6.2 
is length h u (v)du, but the upper is of length h u (v + dv)du 



Figure 6.2: Elemental loop for calculating curl in orthogonal curvilinear coordinates 


Summing this pair gives a contribution to the circulation 

d( h a ) 

3 u {v)h u {v)du — a u (v + dv)h u (v + dv)du =- Ll dvdu 

ov 

and together with the other pair: 


dC 


( d(h u a u ) 
\ dv 


d(h v a v ) \ 
du ) 


dudv 
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So the circulation per unit area is 

dC 1 / d(h v a v ) d{h u a u ) 


h u h v dudv h u h v 
and hence curl is 

curia (u, \/, w ) = 


du 


dv 


d(h w a w ) d(h v a v ) 


h\/h w 


dv dw 

1 f d(h u a u ) _ d(h w a w ) 
h w h u \ dw du 

1 fd{h v a v ) d(h u a u )> 


u 


h,,h 


U * 1 'V 


du 


dv 


w 


Yo u should check that this can be written as 

Curl in curvilinear coords: 


curla(tv, v, w ) = 


h u h v h w 


M 

h v y 

h w w 


d 

d 

du 

dv 

dw 

h u a u 

h v a v 

h w a w 


6.12 The Laplacian in curvilinear coordinates 

Substitution of the components of gradtV into the expression for diva immediately 
(!*?) gives the following expression for the Laplacian in general orthogonal co¬ 
ordinates^_ 


h w h u dU\ a 
h v dv) dw 


6.13 Grad Div, Curl, V 2 in cylindrical polars 

Here (u, v, w) —>• (r, 0, z). The position vector is r = rcos0? + rsin 0J + zk, and 
h r = \dr/dr\, etc. 


=>■ h r = \J (cos 2 0 + s/n 2 0) = 1, 

h ^ = y/(r 2 sin 2 0 + r 2 cos 2 0) = r, 
h z = 1 


h u h v dU\ 
h w dwJ 


V 2 U = 


h u h v h w 


d_ /v^atA _a_ 
du V h u du J + dv 
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6.14. GRAD DIV, CURL, V 2 IN SPHERICAL POLARS 


grad U 

diva 

curia 

V 2 U 


dU . 1 dil„ dU ? 

sr er —"^r e $ ip-k 
or r dp dz 

1 / d(ra r ) daA dry 

r \ dr dtp) dz 

(~ da A ~ , I ( d ( ra $) _ daA t 

\r dp dz) Gr \dz dr ) G< ^ r \ 5r dp J 

Tutorial Exercise 


6.14 Grad Div, Curl, V 2 in spherical polars 

Here (u, v, w ) —>• (r, 0, p). The position vector is r = rsin 6 cosp1+ rsin 0 si n pj + 
r cos Ok. 

=>• h r = ^(sin 2 0(cos 2 p + sin 2 p) + cos 2 6) = 1 
he = ^( r2 cos2 #( cos2 P + sin 2 0) + l 2 sin 2 6) = r 
hep = V (r 2 sin 2 0(sin 2 p + cos 2 p) = rsin 6 


grad U = 
diva = 
curia = 


dr r rdO 
1 d{r 2 a r ) 


■ee 


dU, 1 dU_ 1 aa 

r sin 0 dp 
a(a 0 sin 6) 
r 2 dr ^ r sin 0 a# 




1 aa 


(p 


e r (a . . a . ,\ 

—^ — (a^ sin 0) - — (a 0 ) - 

rsin 0 \a0 dp J 

T (£<*'> ~ W™ 


rsin 0 dp 

e@ 


(-Oi(ar) 


rsin 0 \dp 


-^(a^r sin 0) 


V 2 U = Tutorial Exercise 


X Examples 

Q1 Find curia in (i) Cartesians and (ii) Spherical polars when a = x(x1+yj + zk). 
A1 (i) In Cartesians 


curia = 


/ j k 

d/dx d/dy d/dz 
x 2 xy xz 


-zj + yk . 
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(ii) In spherical polars, x = r sin 9 cos 0 and r = (x? + yj + zk). So 


a 

=>- 3 r 


r 2 sin 0 cos0e r 

r 2 sin# cos 0; 3^ = 0; 3^ = 0. 


Hence as 


curia = 


rsin 6 


d . . d . . 


+ 


ee 


r sin 9 




+ 


^0 




curia = — (Trr(r 2 sin 0 cos 0) ) + — ( — -^-(r 2 sin 0 cos 0) ) 
rsin 6 \a0 ) r \ dO J 

= 6(9 A —r 2 sin 0 sin 0) + — (—r 2 cos 0 cos 0)) 

rsin# r v ' 

= e@( — rsin 0) + e^ —rcos0cos0) 

Checking: these two results should be the same, but to check we need ex¬ 
pressions for e^e^, in terms of? etc. 

Remember that we can work out the unit vectors e r and so on in terms of 1 
etc using 

1 dr 1 dr 1 <9r ^ ^ - 

^ = T 1 Tr’ ee = F 2 Te : e * = Rdt where r = x,+ " +zk • 

Grinding through we find 


" e r " 


sin # cos 0 

sin 0 sin 0 

COS 0 


/ 


" / 

e 0 

— 

cos # COS 0 

cos 0 sin 0 

— sin 0 


XV 

J 

= [tf] 

J 



— sin 0 

COS 0 

0 


XV 

k 


k 


Don’t be shocked to see a rotation matrix [R]: we are after all rotating one 
right-handed orthogonal coord system into another. 

So the result in spherical polars is 

curia = (cos # cos 0/ + cos # sin 0J — sin 0k)(—r sin 0) + (— sin 0/ + cos 0J)( — r cos 0 cos 
= — r cos 0] + rsin 0sin <pk 
= -zj + yk 

which is exactly the result in Cartesians. 


Q2 Find the divergence of the vector field a = rc where c is a constant vector 
(i) using Cartesian coordinates and (ii) using Spherical Polar coordinates. 
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6.14. GRAD DIV, CURL, V 2 IN SPHERICAL POLARS 


A2 (i) Using Cartesian coords: 


diva 


^-(x 2 + y 2 + z 2 ) 1/2 c x + .. . 
ox 

x.(x 2 + y 2 + z 2 )~ 1/2 c x + ... 

1 

-r c . 

r 


(ii) Using Spherical polars 
a = a r e r + ageg + a^e^ 

and our first task is to find a r and so on. We can’t do this by inspection, and 
finding their values requires more work than you might think! Recall 


" e r " 


sin 6 cos 0 

sin 6 sin 0 

cos 6 

eg 

= 

cos 6 COS 0 

cos 6 sin 0 

— sin 6 

e^ 


— sin (j) 

COS 0 

0 



/ 

J 

= [R] 

1 " 

J 


k 


k 


Now the point is the same point in space whatever the coordinate system, so 
a r e r + a e e e + a^e# = a x 1 + a y ] + a z k 
and using the inner product 


" 3 r " 

T r- 

3e 


3(f) 



" 3 r " 

T r- 

3e 

[R] 

3(f) _ 



e r 

eg 


I 

J 

k 


a r 
3e 
3(p _ 


T 


[R] = 


3 r 
3e 

_ 3(f) _ 


T 


= 

" 3 X ' 

3y 

T 

/ 

J 

k 


. . 



= 

" 3 X ' 

3y 

. a z . 

T 

/ 

J 

k 


3 X 
3 y 
3 Z 
3 X 
3 y 
3 Z 


T 


-i T 


[R] 


T 



" a r " 


" <3x " 

=A 

CD CD 
■S- ^ 

1 _ 

= [R] 

3y 

. _ 
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LECTURE 6. VECTOR OPERATOR IDENTITIES 


For our particular problem, a x = rc x , etc, where c x is a constant, so now we 
can write down 


a r = r(sin@cos0c x + sin@sin0c y + cos 6c z ) 
a@ = r(cos0cos0c x + cos0sin0c y — sin@c z ) 
a# = r(— sin0c x + cos 0c y ) 


Now all we need to do is to bash out 

diva - 1 dQ 2 a r ) 1 a(agsing) 1 da# 
r 2 dr rs\r\6 d6 r sin 6 d<fi 

In glorious detail this is 

diva = 3 (sin 6 cos 0c x + sin 6 sin 0c y + cos 6c z ) + 

X 

-—- (cos 2 6 — sin 2 0)(cos 0c x + sin 0c y ) — 2 sin 0 cos @c z ) + 
sin 6 
1 

(- cos 0C X - sin (pCy ) 

sin 0 

A bit more bashing and you’ll find 


diva = sin 6 cos 0c x + sin 0 sin 0c y + cos@c z 

= e r c 


This is EXACTLY what you worked out before of course. 


Take home messages from these examples: 

• Just as physical vectors are independent of their coordinate systems, so are 
differential operators. 

• Don't forget about the vector geometry you did in the 1st year. Rotation 
matrices are useful! 

• Spherical polars were NOT a good coordinate system in which to think about 
this problem. Let the symmetry guide you. 
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Lecture 7 


Gauss’ and Stokes’ Theorems 


This section finally begins to deliver on why we introduced div grad and curl. Two 
theorems, both of them over two hundred years old, are explained: 

• Gauss’ Theorem enables an integral taken over a volume to be replaced by 
one taken over the surface bounding that volume, and vice versa. Why would 
we want to do that? Computational efficiency and/or numerical accuracy! 

• Stokes’ Law enables an integral taken around a closed curve to be replaced 
by one taken over any surface bounded by that curve. 


7.1 Gauss’ Theorem 

Suppose that a(r) is a vector field and we want to compute the total flux of the 
field across the surface S that bounds a volume V. That is, we are interested in 
calculating: 


a • d S 



Figure 7.1: The surface element d S must stick out of the surface. 
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where recall that d S is normal to the locally planar surface element and must 
everywhere point out of the volume as shown in Figure 7.1. 

Gauss’ Theorem tells us that we can do this by considering the total flux generated 
inside the volume V: 


Gauss’ Theorem 


I a ■ dS = J div a dV 


obtained by integrating the divergence over the entire volume. 


7.2 Informal proof 

An non-rigorous proof can be realized by recalling that we defined div by considering 
the efflux dE from the surfaces of an infinitesimal volume element 


dE = a• dS 


and defining it as 

div a dV = dE = a ■ dS . 

If we sum over the volume elements, this results in a sum over the surface elements. 
But if two elemental surface touch, their d S vectors are in opposing direction and 
cancel as shown in Figure 7.2. Thus the sum over surface elements gives the 
overall bounding surface. 



Figure 7.2: When two elements touch, the d S vectors at the common surface cancel out. One 
can imagine building the entire volume up from the infinitesimal units. 
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7.2. INFORMAL PROOF 


X Example of Gauss’ Theorem 

This is a typical example, in which the surface integral is rather tedious, whereas 
the volume integral is straightforward. 

Q Derive J s a ■ dS where a = z 3 k and S is the surface of a sphere of radius R 
centred on the origin: 

1. directly; 

2. by applying Gauss' Theorem 




dz 

z 


Figure 7.3: 


A (1) On the surface of the sphere, a = R 3 cos 3 6k and d S = R 2 sin OdOdffi. 
Everywhere ? • k = cos 6. 

r n'Z'K rTX 

=F / a ■ dS = / / R 3 cos 3 6 . R 2 sin 9d6d(pe r ■ k 

JS J(p= 0 J 9=0 

l'2lT rTC 

= / R 3 cos 3 6 . R 2 s\n6d6d(p . cos 6 

J(p =0 7 9=0 

= 27tR 5 / cos 4 (9 sin 


'o 


2ttR 5 


[- cos 5 e]^ = 


47rR 5 


(2) To apply Gauss' Theorem, we need to figure out div a and decide how to 
compute the volume integral. The first is easy: 

diva = 3 z 2 
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For the second, because diva involves just z, we can divide the sphere into 
discs of constant z and thickness dz, as shown in Fig. 7.3. Then 

dV = 7r(/? 2 — z 2 )dz 



47rR 5 


5 


7.3 Surface versus volume integrals 

At first sight, it might seem that with a computer performing surface integrals 
might be better than a volume integral, perhaps because there are, somehow, 
“fewer elements”. However, this is not the case. Imagine doing a surface integral 
over a wrinkly surface, say that of the moon. All the elements involved in the 
integration are “difficult” and must be modelled correctly. With a volume integral, 
most of the elements are not at the surface, and so the bulk of the integral is 
done without accurate modelling. The computation is easier, faster, and better 
conditioned numerically. 

7.4 Extension to Gauss’ Theorem 

Suppose the vector field a(r) is of the form a = U{ r)c, where U{ r) as scalar field 
and c is a constant vector. Then, as we showed in the previous lecture, 

diva = grad(7 ■ c + (7div c 
= grad(7 ■ c 

since dive = 0 because c is constant. 

Gauss' Theorem becomes 

J Uc ■ dS = J grad U ■ cdV 
or, alternatively, taking the constant c out of the integrals 
c (^J UdS^j = c • ^j grad UdV 
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7.4. EXTENSION TO GAUSS' THEOREM 


This is still a scalar equation but we now note that the vector c is arbitrary so 
that the result must be true for any vector c. This can be true only if the vector 
equation 

J UdS = j grad UdV 
is satisfied. 

If you think this is fishy, just write c = ?, then c = ], and c = k in turn, and you 
must obtain the three components of f s UdS in turn. 

Further “extensions” can be obtained of course. For example one might be able 
to write the vector field of interest as 

a(r) = b(r) x c 

where c is a constant vector. 


X Example of extension to Gauss’ Theorem 

Q U = x 2 + y 2 + z 2 is a scalar field, 
and volume V is the cylinder x 2 + 
y 2 < a 2 , 0 < z < h. Compute the 
surface integral 

l uds 

over the surface of the cylinder. 

A It is immediately clear from sym¬ 
metry that there is no contribution 
from the curved surface of the cylin¬ 
der since for every vector surface el¬ 
ement there exists an equal and op¬ 
posite element with the same value 
of U. We therefore need consider 
only the top and bottom faces. 

Top face: 



dS 


U = x 2 + y 2 + z 2 = r 2 + h 2 and d S = rdrdcfrk 


so 



pa p2.ix 

/ (/? 2 T r 2 )27rr dr / d<fik = kit 

7=0 J(p 0 


h 2 r 2 + ^r 4 


= n[h a + -a ]k 
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Bottom face: 

U = r 2 and d S = — rdrdcfrk 

The contribution from this face is thus — and the total integral is irh 2 a 2 k. 
On the other hand, using Gauss' Theorem we have to compute 


grad UdV 


V 


In this case, grad U = 2r, 


2 / (x? + yk + zk)r dr dz d(p 


'v 


The integrations over x and y are zero by symmetry, so that the only remaining 
part is 


rh na p2lT 

2 / zdz / r dr d(pk = iYa 2 h 2 k 

J z =0 J r =0 J (p =0 


7.5 Stokes’ Theorem 

Stokes’ Theorem relates a line integral around a closed path to a surface integral 
over what is called a capping surface of the path. 


Stokes' Theorem states: 


j) a • d\ — J curl a • d S 
where S is any surface capping the curve C. 


Why have we used d I rather than d r, where r is the position vector? 

There is no good reason for this, as d I = dr. It just seems to be common usage 
in line integrals! 
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7.6. INFORMAL PROOF 


7.6 Informal proof 

You will recall that in Lecture 5 that we defined curl as the circulation per unit 
area, and showed that 

a • c/I = dC = (V x a) • dS . 

around elemental loop 


Now if we add these little loops together, the internal line sections cancel out 
because the d I's are in opposite direction but the field a is not. This gives the 
larger surface and the larger bounding contour as shown in Fig. 7.4. 



Figure 7.4: An example of an elementary loop, and how they combine together. 


For a given contour, the capping surface can be ANY surface bound by 
the contour. The only requirement is that the surface element vectors point in 
the “general direction” of a right-handed screw with respect to the sense of the 
contour integral. See Fig. 7.5. 




Figure 7.5: For a given contour, the bounding surface can be any shape. dS's must have a positive 
component in the sense of a r-h screw wrt the contour sense. 
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LECTURE 7. GAUSS' AND STOKES’ THEOREMS 


£ Example of Stokes’ Theorem 

In practice, (and especially in exam questions!) the bounding contour is often 
planar, and the capping surface flat or hemispherical or cylindrical. 

Q Vector field a = x 3 j — y 3 1 and C is the circle of radius R centred on the origin. 
Derive 


j) a • d\ 

directly and (ii) using Stokes' theorem where the surface is the planar surface 
bounded by the contour. 

A(i) Directly. On the circle of radius R 

a = R 3 (— sin 3 91 + cos 3 9j) 

and 

d\ = RdO(— sin 91 + cos 9j) 
so that: 

"27T 


j) a • d\ = j R 4 (sin 4 9 + cos 4 9)d9 = ^-R 4 , 


since 


•»27T 


sin 4 9d9 = 


/“ 27r S-7T 

/ cos 4 9d9 = — 
'o 4 


A(ii) Using Stokes' theorem ... 


curl a = 


/ j k 

d_ d_ d_ 

dx dy dz 

—j/ 3 X 3 0 


= 3(x 2 + y 2 )k = 3 r 2 k 


We choose area elements to be circular strips of radius r thickness dr. Then 




d S = 2-Krdrk and 


37T 


/ curl a • c/S = 67r / r 3 dr = — R 4 

Is Jo 2 
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7.7. AN EXTENSION TO STOKES’ THEOREM 


7.7 An Extension to Stokes’ Theorem 

Just as we considered one extension to Gauss' theorem (not really an extension, 
more of a re-expression), so we will try something similar with Stoke's Theorem. 

Again let a(r) = U( r)c, where c is a constant vector. Then 
curl a = Uc url c + grad U x c) 

Again, curl c is zero. Stokes' Theorem becomes in this case: 

J U( c - c/I) = J (grad U x c ■ dS = J c • (dS x grad U) 

or, rearranging the triple scalar products and taking the constant c out of the 
integrals gives 

c ■ <j) Ud\ = — c • J grad U x dS . 

But c is arbitrary and so 




grad U x d S 


7.8 A Example of extension to Stokes’ Theorem 


Q Derive § c Ud r (i) directly and (ii) using Stokes', 
where U = x 2 + y 2 + z 2 and the line integral is taken 
around C the circle (x — a) 2 +y 2 = a 2 and z = 0. 

Note that, for no special reason, we have used dr 
here not d I. 



A(i) First some preamble. 

If the circle were centred at the origin, we would write dr = adOee = 
ad6(— sin 0/ + cos0J). For such a circle the magnitude r = |r| = a, a constant 
and so dr = 0. 

However, in this example dr is not always in the direction of e@, and dr ^ 0. 
Could you write down dr? If not, revise Lecture 3, where we saw that in plane 
polars x = rcosO, y = rsin 6 and the general expression is 

dr = dxl + dyj = (cos 6dr — r sin OdO)l + (sin 6dr + r cos 6d6)j 
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To avoid having to find an expression for r in terms of 0, we will perform a 
coordinate transformation by writing r = [a, 0] T + p. So, x = (a + pc os a) 
and y = psin a, and on the circle itself where p = a 

r = a(l + cos a)/ + a sin aj , 

dr = ada{— sin al + cos a]) , 
and, as z = 0 on the circle, 

U = a 2 (l + cos a) 2 + a 2 sin 2 a = 2a 2 (l + cos a) . 

The line integral becomes 

/ Udr = 2a 3 / (1 + cos a)(— sin al + cos aj)da = 2na 3 ] 

Ja =0 

A(ii) Now using Stokes' ... 

For a planar surface covering the disc, the surface element can be written 
using the new parametrization as 

dS = p dp dak 

Remember that U = x 2 + y 2 + z 2 = r 2 , and as z = 0 in the plane 
grad U = 2(x? + yj + zk) = 2(a + pcosa)/ + 2psin a] . 

Be careful to note that x, y are specified for any point on the disc, not on its 
circular boundary! 

So 


c/Sxgrad(2 


2p dp da 


/ ] k 

0 0 1 
(a + pcosa) psina 0 


2p[—psin a?+(a+pcosa)j] dp da 


Both J 0 2?r sin ada = 0 and f Q Z7r cos ada = 0, so we are left with 


-27T 


n3 /»27T 

dS x grad(2 = / / 2paj dp da = 27ra 3 j 

a p=o a a=0 
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Lecture 8 


Engineering applications 


In Lecture 6 we saw one classic example of the application of vector calculus to 
Maxwell’s equation. 

In this lecture we explore a few more examples from fluid mechanics and heat 
transfer. As with Maxwell's eqations, the examples show how vector calculus 
provides a powerful way of representing underlying physics. 

The power come from the fact that div, grad and curl have a significance or 
meaning which is more immediate than a collection of partial derivatives. Vector 
calculus will, with practice, become a convenient shorthand for you. 

• Electricity - Ampere's Law 

• Fluid Mechanics - The Continuity Equation 

• Thermo: The Heat Conduction Equation 

• Mechanics/Electrostatics - Conservative fields 

• The Inverse Square Law of force 

• Gravitational field due to distributed mass 

• Gravitational field inside body 

• Pressure forces in non-uniform flows 
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8.1 Electricity - Ampere’s Law 

If the frequency is low, the displacement current in Maxwell’s equation curIH = 
J + dD/dt is negligible, and we find 

curIH = J 

Hence 


curIH • d S 


J • d S 


or 


H d I 


J • d S 


where f 5 J • dS is total current through the surface. 

Now consider the H around a straight wire carrying current /. Symmetry tells us 
the H is in the e@ direction, in a rhs screw sense with respect to the current. (You 
might check this against Biot-Savart’s law.) 

Suppose we asked what is the magnitude of H? 



Inside the wire, the bounding contour only encloses a fraction (tt r 2 ) / (iv a 2 ) of the 
current, and so 


H2ivr = J J • dS = !{r 2 /A 2 ) 

=> H = lr/2TrA 2 

whereas outside we enclose all the current, and so 


H2irr 


J J • ofS = / 


H = //27T r 


A plot is shown in the Figure. 
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8.2. FLUID MECHANICS - THE CONTINUITY EQUATION 


8.2 Fluid Mechanics - The Continuity Equation 

The Continuity Equation expresses the condition of conservation of mass in a 
fluid flow. The continuity principle applied to any volume (called a control volume ) 
may be expressed in words as follows: 

“The net rate of mass flow of fluid out of the control volume must equal 
the rate of decrease of the mass of fluid within the control volume" 



q 


Control Volume V 


Figure 8.1: 

To express the above as a mathematical equation, we denote the velocity of the 
fluid at each point of the flow by q(r) (a vector field) and the density by p(r) (a 
scalar field). The element of rate-of-volume-loss through surface dS is dV = q dS, 
so the rate of mass loss is 

dM = pq • d S, 

so that the total rate of mass loss from the volume is 

Assuming that the volume of interest is fixed, this is the same as 

-L a Tt dV = J™- dS - 

Now we use Gauss' Theorem to transform the RHS into a volume integral 

~I v Tt dV = I /" 1 {m)dV ■ 

The two volume integrals can be equal for any control volume V only if the two 
integrands are equal at each point of the flow. This leads to the mathematical 
formulation of 
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The Continuity Equation: 

, , x dp 

dlv (pq) = ~Tt _| 

Notice that if the density doesn’t vary with time, div (pq) = 0, and if the density 
doesn’t vary with position then 

The Continuity Equation for uniform, time-invariant density: 

div (q) = 0 . 

In this last case, we can say that the flow q is solenoidal. 


8.3 Thermodynamics - The Heat Conduction Equation 

Flow of heat is very similar to flow of fluid, and heat flow satisfies a similar con¬ 
tinuity equation. The flow is characterized by the heat current density q(r) (heat 
flow per unit area and time), sometimes misleadingly called heat flux. 

Assuming that there is no mass flow across the boundary of the control volume and 
no source of heat inside it, the rate of flow of heat out of the control volume by 
conduction must equal the rate of decrease of internal energy (constant volume) 
or enthalpy (constant pressure) within it. This leads to the equation 


div q = —pc 


dT 

a? 


where p is the density of the conducting medium, c its specific heat (both are 
assumed constant) and T is the temperature. 

In order to solve for the temperature field another equation is required, linking q 
to the temperature gradient. This is 


q = —k, grad T, 


where k is the thermal conductivity of the medium. Combining the two equations 
gives the heat conduction equation : 


—div q = k div grad T = kV 2 T = pc 


dT 

dt 


where it has been assumed that k, is a constant. In steady flow the temperature 
field satisfies Laplace's Equation V 2 T = 0. 
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8.4. MECHANICS - CONSERVATIVE FIELDS OF FORCE 


8.4 Mechanics - Conservative fields of force 

A conservative field of force is one for which the work done 



moving from A to B is indep. of path taken. As we saw in Lecture 4, conservative 
fields must satisfy the condition 


F • dr = 0, 


Stokes' tells us that this is 


curl F • dS = 0, 


Js 

where S is any surface bounded by C. 

But if true for any C containing A and B, it must be that 

curl F = 0 


Conservative fields are irrotational 
All radial fields are irrotational 

One way (actually the only way) of satisfying this condition is for 

F = V U 


The scalar field U( r) is the Potential Function 
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8.5 The Inverse Square Law of force 

Radial forces are found in electrostatics and gravitation — so they are certainly 
irrotational and conservative. 

But in nature these radial forces are also inverse square laws. One reason why this 
may be so is that it turns out to be the only central force field which is solenoidal, 
i.e. has zero divergence. 

If F = f(r)r, 

div F = 3f(r) + rf'(r). 


For div F = 0 we conclude 
df 

r— + 3f = 0 
dr 

or 

df dr 

—+ 3—= 0. 
f r 

Integrating with respect to r gives fr 3 = const = A , so that 
Ax A 


-3 ’ 


-2 ' 


The condition of zero divergence of the inverse square force field applies everywhere 
except at r = 0, where the divergence is infinite. 

To show this, calculate the outward normal flux out of a sphere of radius R centered 
on the origin when F = Fr. This is 


F • dS = F 


r ■ dS = F 


d = FA-toR 2 = 47 \A = Constant. 


'Sphere 


'Sphere 


Gauss tells us that this flux must be equal to 

/ div F dV = I div F47T r 2 dr 
Jv Jo 

where we have done the volume integral as a summation over thin shells of surface 
area 47rr 2 and thickness dr. 

But for all finite r, divF = 0, so divF must be infinite at the origin. 

The flux integral is thus 

• zero — for any volume which does not contain the origin 

• 47 tA for any volume which does contain it. 
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8.6. GRAVITATIONAL FIELD DUE TO DISTRIBUTED MASS: POISSON'S EQUATION 

8.6 Gravitational field due to distributed mass: Poisson’s Equa¬ 
tion 

If one tried the same approach as §8.4 for the gravitational field, A = Gm, where 
m is the mass at the origin and G the universal gravitational constant, one would 
run into the problem that there is no such thing as point mass. 

We can make progress though by considering distributed mass. 

The mass contained in each small volume element dV is pdV and this will make a 
contribution —AirpGdV to the flux integral from the control volume. Mass outside 
the control volume makes no contribution, so that we obtain the equation 



F • ofS = 


—47rG 



pdV. 


Transforming the left hand integral by Gauss' Theorem gives 


/ div F dV = —47rG I pdV 
Jv Jv 

which, since it is true for any V/, implies that 
—div F = 47rpG. 

Since the gravitational field is also conservative (i.e. irrotational) it must have 
an associated potential function U, so that F = grad U. It follows that the 
gravitational potential U satisfies 

Poisson’s Equation 

V 2 G = 4?rpG . _ 

Using the integral form of Poisson's equation, it is possible to calculate the gravi¬ 
tational field inside a spherical body whose density is a function of radius only. We 
have 


f R 

47 tR 2 F = 47tG / 47T r 2 pdr, 
Jo 

where F = I FI, or 


F 


G 

R2 




A-nrpdr 


MG 


where M is the total mass inside radius R. For the case of uniform density, this is 
equal to M = | irpR 3 and |F| = | ivpGR. 
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8.7 Pressure forces in non-uniform flows 

When a body is immersed in a flow it experiences a net pressure force 



where S is the surface of the body. If the pressure p is non-uniform, this integral 
is not zero. The integral can be transformed using Gauss' Theorem to give the 
alternative expression 

F p = — / grad p dV } 

Jv 

where V is the volume of the body. In the simple hydrostatic case p + pgz = 
constant, so that 

grad p = -pgk 

and the net pressure force is simply 

Fp = pk f pdV 
Jv 

which, in agreement with Archimedes’ principle, is equal to the weight of fluid 
displaced. 
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